Summary of Epigenetic Databases

CD Genomics Blog

Explore the blog we've developed, including genomic education, genomic technologies, genomic advances, and genomics news & views.

Posted on January 5, 2024

Science’s understanding of genetics primarily focuses on the alterations in gene expression caused by genetic sequence modifications, encompassing gene mutations, heterogeneous loss, and microsatellite instability. Contrastingly, epigenetics considers the changes in gene expression attributable to non-genomic sequence variations, such as DNA methylation and modifications in chromatin conformation. The broader field of epigenomics is dedicated to the genomic scale investigation of these epigenetic alterations.

Epigenetic Databases Introduction

DNA Methylation Database

MethDBdatabase

URL: http://www.methdb.de

Function: This database encompasses methylation patterns, methylation spectra, and overall methylation content data across various tissues and phenotypes. Approximately 14% of the data has not been previously published elsewhere. These data are readily accessible for retrieval, and an online submission tool is provided, enabling researchers to input new data directly into MethDB.

PubMeth

URL: http://www.pubmeth.org

Function: PubMeth gathers and organizes methylation data from literature related to cancer. The data undergoes manual curation and annotation, resulting in a high-quality database of genes associated with methylation in cancer. This database exclusively offers online search functionality and does not allow direct downloads.

MethCancer

URL：http://methycancer.psych.ac.cn/

Function: Cancer, as the foremost threat to human health, has captured the attention of researchers. Considering the crucial role of DNA methylation modifications in tumors, the MethyCancer database, an assemblage of human DNA Methylation and Cancer data, correlates cancer with relevant genes.

The foundation of this database is highly curated DNA methylation data, genes implicated in cancer, mutations, and cancer information from publicly accessible resources. Data from CpG Island (CGI) clones, generated through expansive sequencing approaches by database researchers, are additionally integrated. These numerous data types are analyzed and interconnections are delineated. Included in the database’s pragmatically developed features are search tools and the user-friendly MethyView graphical interface. These tools enable effortless access to all data and intersections and empower users to explore DNA methylation in tandem with genomic and genetic data.

MethSurv

URL：https://biit.cs.ut.ee/methsurv/

Function: The TCGA database primarily includes data from the 450K array, with some earlier data from the 27K array. MethSurv is designed for researchers and clinical professionals lacking specific bioinformatics skills (unfamiliar with programming analysis). It serves as a tool primarily for exploring methylation biomarkers associated with the survival of cancer patients. MethSurv supports survival analysis for CpGs located within or near the queried genes. It encompasses methylation data from 7,358 samples across 25 different human cancers.

Epigenome-wide Association Study, (EWAS)

IHEC Data Portal

URL：https://epigenomesportal.ca/ihec/

This website provides data from seven international consortia—ENCODE, NIH Roadmap, CEEHRC, Blueprint, DEEP, AMED-CREST, and KNIH—encompassing more than 7,000 epigenomic reference datasets from over 600 distinctive tissues. Users can browse and select from various datasets, and also use associated tools to compare their chosen datasets. While the datasets are downloadable, access requests are required for original data. Furthermore, users can share the datasets they’ve selected.

Blueprint Data Analysis Portal

URL：http://blueprint-data.bsc.es

Function: The Blueprint Data Analysis Portal serves as a medium for analyzing data derived from the Blueprint Consortium. This consortium has generated a reference epigenome for hematopoietic cell lineages, with datasets encompassing ChIP-seq, DNAsel-seq, whole-genome bisulfite sequencing, and RNA-seq data spanning over 60 different cell types. Within this website, researchers utilize the Epigenomics Comparative Cyber-Infrastructure (EPICO) platform, comprising five components: a data model and a data validation and loading program; an empty database to accommodate the data and metadata resulting from these processes; the Application Programming Interface (API); and the Data Analysis Portal. In addition to EPICO, this method necessitates storage space for database construction, connections to acquire raw data, and modules for query reception and result distribution. This portal allows users, even those with minimal bioinformatics backgrounds, to visualize and comparatively study the epigenomic and transcriptomic data of hematopoietic cell types of interest.

eFORGE

URL：http://eforge.cs.ucl.ac.uk/

Functionality: The eFORGE tool empowers users to filter data from Epigenome-Wide Association Studies (EWAS), helping to identify cell types relevant to specific diseases. eFORGE provides estimates of where differential methylation sites might be functional within certain tissues or cells. This is accomplished by examining the overlap between a set of differential methylation sites and a reference map of DNase I hypersensitive sites. These reference sets include 454 samples from diverse tissues, primary cell types, and cell lines from the ENCODE, Roadmap Epigenomics, and Blueprint consortia.

iMETHYL

URL：http://imethyl.iwate-megabank.org/index.html

Function: iMETHYL integrates comprehensive datasets of approximately 100 subjects’ CD4+ T-lymphocytes, monocytes, and neutrophils. This includes whole DNA methylation data (an estimated 24 million CpG sites on autosomes), whole-genome data (around 9 million nucleotide variations), and transcriptome data (more than 14,000 genes). Deriving from bisulfite whole-genome sequencing, whole-genome resequencing, and transcriptome sequencing, iMETHYL serves as a multi-faceted database. It unifies SNP, DNA methylation, and RNA expression data and conducts correlations between pairwise datasets.

M6A Database

N6-methyladenosine (m6A) retains the distinction as the most widespread post-transcriptional modification observed in eukaryotic organisms. Its participation in an assortment of biological mechanisms is crucial, with primary contributions seen in domains such as alternative splicing, RNA degradation, and the interface of RNA-protein. This suggests m6A is influential in imparting a layer of regulatory complexity in facets of cellular biology.

Whistle

URL：http://180.208.58.19/whistle/index.html

Function: WHITSLE provides an avenue for predicting m6A RNA methylation points within the transcriptomic landscape. This theoretical approach marries sequencing data and computational machine learning in a bid to locate sites under the regulatory influence of m6A modification. The only prerequisite involves the specification of the gene of interest for searching purposes. Furthermore, to examine the consequences of m6A methylation on a particular function one can simply feed in the title of the function into the system.

MeT-DB V2.0

URL：http://www.xjtlu.edu.cn/metdb2

Function: The database amasses ParCLIP-seq and MeRIP-seq data, encompassing eight m6A-regulated factors (FTO, KIAA1429, METTL14, METTL3, WTAP, HNRNPC, YTHDC1, YTHDF1). Users, through the corresponding sequencing data, have the ability to probe specific regulatory positions. Moreover, it provides output derived from comparative analyses conducted with other pertinent miRNA and splicing factor databases.

REPIC

URL：https://repicmod.uchicago.edu/repic

Function: Inaugurated in April 2020, the REPIC (RNA EPItranscriptome Collection) database brings together a significant volume of sequencing data. It comprises nearly 10 million peaks identified from publicly accessible m6A-seq and MeRIP-seq data, collected from 672 samples used in 49 studies. These samples signify 61 distinct cell lines or cultural tissues derived from 11 disparate organisms. REPIC provides users the ability to explore m6A modification sites based on individual cell lines or tissue types. Furthermore, it consolidates m6A/MeRIP-seq data with 1,418 protein ChIP-seq and 118 DNA-seq datasets curated from the overarching ENCODE project, yielding an exhaustive representation of m6A methylation sites.

m6A2Target

URL：http://m6a2target.canceromics.org/

Function: m6A2Target serves as an exhaustive repository for target genes pertaining to m6A modification writers, erasers, and readers (collectively referred to as WERs). It incorporates evidence sourced from low-throughput experimental validations of targets deemed highly reliable, alongside high-throughput sequencing data obtained from processes such as CLIP-Seq, RIP-seq, and ChIP-seq. Furthermore, it amalgamates prospective targets deduced from perturbations in m6A response, complemented by subsequent high-throughput sequencing data, including but not limited to RNA-Seq, m6A-Seq, and Ribo-Seq.

Other

HEDD

URL：http://zdzlab.einstein.yu.edu/1/hedd.php

Function:The Human Enhancer Disease Database (HEDD) is a repository of comprehensive genomic information on approximately 2.8 million human enhancers identified by ENCODE, FANTOM5, and RoadMap. It assigns disease association scores based on the connections between enhancers, genes, and diseases. The database also features network-based analysis tools for visualizing enhancer networks and scoring enhancers given a selected set of genes within a specific gene network.

Epifactors

URL：http://epifactors.autosome.ru/

Function: The study of epigenetic markers such as DNA methylation, histone modifications, chromatin remodeling, and other molecular mechanisms has seen rapid development, accumulating a wealth of knowledge in related fields. By organizing and summarizing the functions of genes corresponding to proteins implicated in these epigenetic mechanisms, the Epifactors database goes beyond traditional proteins directly involved in epigenetics. It introduces the concept of "epigenetic factors," connecting core proteins directly involved in epigenetics with their peripheral counterparts. This approach further enriches the network structure of molecules relevant to epigenetics.

What We Provide

As a foremost entity in genomic research, CD Genomics presents a premier Epigenomics Data Analysis Service to provide scientists and investigators the tools to decipher the intricate patterns within epigenetic information. Our team, comprising proficient bioinformaticians, ceaselessly enhance our pipelines to assure delivery of biologically pertinent results, meeting tight timelines. Through our use of advanced algorithms, computational methodology, and machine learning applications, CD Genomics allows investigators to extract significant conclusions from their collected epigenomic data.

Chromatin studies (ChIP-Seq, ATAC-Seq)

Methylation studies (WGBS, RRBS, TBS, EPIC, MeDIP-Seq)

RNA-Seq

Peak Calling and Annotation

Exploratory analysis

Differential peak analysis

Transcription factor binding site analyses

Integrative analysis of multiple omics data

Data mining with machine learning techniques