Introduction: What is KEGG and Why is it Important for Researchers
The Kyoto Encyclopedia of Genes and Genomes (KEGG) is an advanced computational tool in molecular biology research. Originated by Kanehisa Laboratories during 1995, this innovative database emerged as a comprehensive genomic repository, dedicated to advancing deep understanding of complex biological systems. Through continuous evolution, KEGG has solidified its status as an indispensable investigative instrument for global scientific communities.
KEGG Overview. (Image source: KEGG official website, https://www.genome.jp/kegg/kegg1a.html)
Architectural Components
Two fundamental database segments define KEGG’s computational infrastructure:
1. KEGG ORTHOLOGY (KO) is a system that classifies genes based on their similar functions across different biological systems. By categorizing genes into Ortholog Groups, researchers can:
- Identify conserved genetic functionalities
- Facilitate comparative genomic investigations
- Enable cross-species molecular analysis
2. KEGG PATHWAY: A graphical representation framework depicting complex biochemical interactions, stratified across six primary categorical domains:
- Cellular Processual Mechanisms
- Environmental Interaction Pathways
- Genetic Information Transmission
- Pathological System Representations
- Fundamental Metabolic Networks
- Comprehensive Organismal Systems
Organized Pathway Structure
KEGG’s pathway architecture demonstrates remarkable computational sophistication:
- Primary Level: 43 foundational pathway categories
- Secondary Level: Detailed metabolic mapping
- Tertiary Level: Molecular interaction annotations
Why is KEGG Important for Researchers?
For contemporary researchers, KEGG transcends traditional database functionality. It provides:
- Integrated genomic and functional insights
- Systematic exploration of molecular interactions
- Visual representations of biochemical networks
- Comprehensive disease mechanism analysis
By synthesizing genomic sequences with functional interpretations, KEGG empowers scientific investigation across multiple disciplinary boundaries.
KEGG emerges not merely as a database, but as a transformative computational framework enabling profound molecular comprehension. Its capacity to bridge genetic complexity with functional understanding positions it as a quintessential research instrument in contemporary life sciences.
Overview of the KEGG Database
The KEGG Database stands as a versatile tool indispensable for biological and biomedical research. It provides a wide array of data spanning multiple informational categories. Below, the database’s components are meticulously detailed, offering a comprehensive view of the diverse types of data encapsulated within KEGG.
A conceptual diagram of the KEGG NETWORK database. (Minoru Kanehisa, et al., 2019)
Classification and Database Categories
The constituent databases under KEGG can be categorized as follows:
Classification Type | Database | Description |
---|---|---|
System Information | KEGG PATHWAY | KEGG Metabolic Pathway Maps |
KEGG BRITE | BRITE Functional Hierarchies | |
KEGG MODULE | KEGG Functional Unit Modules | |
Genomic Information | KEGG ORTHOLOGY | KEGG Ortholog Groups (KO) |
KEGG GENOME | Species with Complete Genomes in KEGG | |
KEGG GENES | Catalog of Genes in Complete Genomes | |
KEGG SSDB | KEGG Sequence Similarity Database | |
Chemical Information | KEGG COMPOUND | Metabolites and Other Small Molecules |
KEGG GLYCAN | Polysaccharides | |
KEGG REACTION | Biochemical Reactions | |
KEGG ENZYME | Enzyme Nomenclature | |
Health Information | KEGG DISEASE | Diseases |
KEGG DRUG | Drugs | |
KEGG ENVIRON | Health-Related Substances | |
KEGG NETWORK | Disease-Related Network Elements | |
KEGG MEDICUS | Health Information Resources | |
JAPIC | Japan Pharmaceutical Information Center Database | |
DailyMed | FDA Drug Database (Link Only) |
Specifics of the KEGG Pathway Database
The KEGG PATHWAY database is a pivotal and widely consulted section of the KEGG resource, representing a comprehensive collection of biological pathways. These pathways offer deep insights into metabolic networks, disease mechanisms, and other biological phenomena. Each pathway is assigned a distinct identifier and is classified into specific types, each representing unique aspects of biological data:
Pathway Type | Description | Example |
---|---|---|
map | Reference pathways that summarize and represent well-established biological knowledge. | map00010 (Glycolysis) |
org | Species-specific pathways that represent pathways in a specific organism, substituting KO genes with corresponding genes in the species. | hsa00010 (Human Glycolysis) |
ko | KO pathway types where each point represents an orthologous gene (KO entry). | ko00010 (General Metabolic Pathway) |
ec | EC pathway types where each point corresponds to an enzyme classification (EC number). | ec1 (Oxidoreductases) |
rn | Reaction pathway types where each point represents a specific chemical reaction. | rn01234 (Amino acid metabolism) |
Key Types of Data Available in the KEGG Database
The KEGG database represents a sophisticated computational infrastructure for biological research, strategically organized into multifaceted investigative components. Each segment provides unique insights into molecular complexity:
PATHWAY Component
A pivotal repository of graphical molecular interaction representations, the PATHWAY module comprehensively documents metabolic, signaling, and physiological processes. These intricate visual mappings enable researchers to explore sophisticated biochemical interactions and their fundamental biological contributions.
BRITE Hierarchical Classification
This innovative organizational system categorizes biological functions through hierarchically structured classifications. BRITE facilitates sophisticated management of expansive genomic datasets, providing nuanced insights into molecular entity interactions across comprehensive biological systems.
MODULE Subpathway Analysis
Focusing on evolutionarily conserved biochemical reaction modules, this component enables comparative genomic investigations. Researchers can efficiently identify core reaction mechanisms persistently observable across diverse organismal systems, illuminating fundamental molecular conservation principles.
GENES Comprehensive Repository
An extensive catalog documenting genetic information from multiple organisms, the GENES section provides:
- Detailed functional annotations
- Associated molecular pathway connections
- Comprehensive sequence data
This resource serves critical roles in:
- Genome-wide association studies
- Advanced genotyping investigations
- Molecular characterization efforts
By synthesizing these computational components, KEGG provides researchers with a robust analytical framework, supporting sophisticated investigations across genomic, systemic, and molecular research domains.
Guide to Downloading Data from the KEGG Database:
Downloading data from the KEGG Database is a systematic process tailored for researchers requiring structured datasets for in-depth biological analysis. Here is a concise guide to navigate this procedure:
Step 1: Access the KEGG Website
Begin by accessing the official KEGG portal. This website is the central access point for the diverse databases and resources that KEGG offers.
Step 2: Select the Appropriate Database or Dataset
Upon reaching the KEGG homepage, identify and select the database or dataset of interest. Options range from KEGG PATHWAY and KEGG GENOME to KEGG DISEASE, among others. Navigation through specific categories can be accomplished by selecting the relevant links within the KEGG Databases section.
Step 3: Utilize KEGG’s Download Tools
KEGG facilitates data retrieval through several methodologies, which are contingent on the specific data requirements:
- REST API: This method allows automated data access, ideal for bulk downloads.
- FTP Download: For extensive datasets, consider the FTP (File Transfer Protocol) option, enabling bulk data downloads from the comprehensive KEGG database.
- Download Scripts: Tailored scripts provided by KEGG simplify access to particular data types, such as pathway maps, genetic sequences, or chemical data.
Step 4: Select the Desired Data Format
You may choose from several data formats, each serving different purposes:
- JSON: A versatile format that supports programmatic manipulation of data.
- Flat Files: These text-based (.txt) files allow for easy manual inspection and are structured tabularly.
Step 5: Download and Extract the Data
After finalizing your data format, proceed to download. For FTP or script-based downloads, adhere to specified instructions for extracting and organizing the files appropriately. Ensure that your system has the requisite tools or software to efficiently manage large datasets, especially crucial for genome-wide or pathway map data.
Step 6: Conduct Data Analysis
Post-download, leverage bioinformatics tools for comprehensive data analysis. This stage may involve examining the data for metabolic pathway dynamics, annotating gene functions, or facilitating disease modeling studies.
This guide is crafted to streamline the process of data acquisition from KEGG, ensuring an efficient pathway from data selection to analytical application in genomic or biomedical research.
(A) Pathway analysis based on the KEGG database. (B) Enrichment analysis based on SMPDB. (C) Metabolic network of the crucial metabolites and significant metabolic pathways in the KEGG general metabolic pathway map. (Zhuang, F., et al., 2022)
Applications of KEGG Data in Molecular Research
The KEGG database emerges as a critical computational resource for researchers investigating complex biological systems across genomics, pharmacology, and systems biology. By offering comprehensive molecular interaction maps and annotated pathway information, KEGG enables sophisticated analytical approaches in contemporary life sciences.
1. Pharmaceutical Target Identification
KEGG pathways play a pivotal role in pharmaceutical research, facilitating the systematic exploration of potential therapeutic interventions. Researchers leverage these intricate molecular network representations to:
- Identify candidate molecular targets for therapeutic development
- Analyze potential drug repurposing strategies
- Comprehend intricate drug-target interaction mechanisms
A landmark study by Chen et al. (2015) demonstrated how pathway enrichment analysis could categorize pharmacological targets based on their underlying biological functionalities, providing researchers with a robust conceptual framework for drug discovery.
2. Disease Mechanism Elucidation
The database helps map molecular networks, which is key to understanding disease mechanisms. By integrating genetic variation data with signaling pathway information, researchers can:
- Visualize genetic perturbations within molecular networks
- Identify potential biomarkers
- Understand disease progression at the molecular level
Kanehisa et al. (2019) introduced the KEGG NETWORK database, which enables sophisticated visualization of how genetic variations influence cellular signaling pathways.
3. Metabolomics and Genomic Integration
KEGG helps connect genomic data with metabolic processes. Researchers utilize the database to:
- Interpret high-throughput experimental data
- Map metabolic pathways across diverse biological systems
- Correlate genetic information with metabolic functionalities
Kanehisa’s (2016) research highlighted the database’s utility in plant genomics, demonstrating its versatility across biological domains.
KEGG pathway analysis of proteomics data. (Li, Z., et al., 2020)
4. Omics Data Synthesis
Advanced bioinformatics tools now facilitate more comprehensive analyses by integrating KEGG data with multiple omics datasets. Innovative approaches, such as the "ggkegg" package introduced by Sato et al. (2023), enable:
- Enhanced visualization of complex biological networks
- Simultaneous analysis of transcriptomic and proteomic data
- Streamlined pathway enrichment investigations
5. Oncological Research Applications
In cancer research, KEGG pathways provide crucial insights into tumorigenesis and disease progression. Researchers like Kim et al. (2018) have developed specialized systems, such as BRCA-Pathway, which:
- Integrate genomic cancer databases
- Visualize signaling network alterations
- Enhance understanding of molecular mechanisms underlying cancer development
6. Computational Analysis Advancements
The emergence of specialized bioinformatics tools has significantly enhanced KEGG data analysis capabilities. Recent developments, exemplified by Pedersen et al. (2023), include:
- Creation of dedicated analysis packages
- Improved pathway visualization techniques
- Simplified enrichment analysis protocols
KEGG goes beyond traditional databases, providing a resource that links genomic data, molecular interactions, and biological understanding. Its multifaceted applications continue to drive innovation across research domains, from pharmaceutical development to fundamental biological investigations.
KEGG Database: Current Statistics and Usage
As of December 2024, the KEGG database continues its dynamic expansion, maintaining its status as a cornerstone in the field of bioinformatics.
A KEGG Global Metabolic Pathway generated with the KEGGscape app. (Nishida, K., et al., 2014)
Current Statistics:
- Pathway Maps and Gene Catalogs:
- KEGG hosts an extensive collection of 576 pathway maps, encompassing various metabolic, signaling, and biochemical pathways.
- The database incorporates over 1.3 million references, offering a comprehensive dataset for diverse research initiatives.
- Diversity of Organisms:
- The database catalogs genes from over 56 million entries across a wide spectrum of organisms, facilitating comparative genomics and multi-species research. Additionally, there are 27,293 orthology groups, which are integral for identifying conserved gene functions across species.
Future Directions: What’s Next for KEGG
With the progression of biological sciences, the KEGG database is poised for several enhancements to support the research community more effectively.
- Integration with Other Databases:
Future updates to KEGG are likely to include enhanced integration with complementary databases such as UniProt and Gene Ontology, enriching the accessibility and functionality of data for researchers. - Expansion into Metagenomics and Personalized Medicine:
As fields like metagenomics and personalized medicine gain prominence, KEGG plans to expand its resources to cater to these areas. This will involve providing more detailed genomic and functional data tailored to individual organisms and microbial communities.
If you want to learn about Gene Ontology, you can read the following article:
Comprehensive Guide to Gene Ontology (GO) Analysis and Its Applications in Genomics
Conclusion
In conclusion, KEGG remains an invaluable resource for advancing research across genomics, systems biology, and drug discovery. Its comprehensive compendium of biological pathways, genes, and chemical compounds furnishes critical insights necessary for elucidating complex biological systems.
To leverage KEGG’s powerful datasets, researchers are encouraged to download resources pertinent to their studies, whether it be analyzing metabolic pathways, exploring gene functions, or investigating potential drug targets. KEGG’s wide range of data can greatly advance scientific research.
For specialized bioinformatics services or genomic data analysis, consider exploring CD Genomics’ solutions. Our experts are equipped to assist with complex data interpretation and provide customized services tailored to your specific research objectives.
References
- Chen, L., Chu, C., Lu, J., Kong, X., Huang, T., & Cai, Y.-D. (2015). Gene Ontology and KEGG Pathway Enrichment Analysis of a Drug Target-Based Classification System. PLoS ONE, 10(5), e0126492. https://doi.org/10.1371/journal.pone.0126492.
- Kanehisa, M., Sato, Y., Furumichi, M., Morishima, K., & Tanabe, M. (2019). New approach for understanding genome variations in KEGG. Nucleic Acids Research, 47(D1), D590–D595. https://doi.org/10.1093/nar/gky962.
- Kanehisa, M. (2016). KEGG Bioinformatics Resource for Plant Genomics and Metabolomics. In: Edwards, D. (eds) Plant Bioinformatics. Methods in Molecular Biology, vol 1374. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-3167-5_3.
- Sato, N., Uematsu, M., Fujimoto, K., Uematsu, S., & Imoto, S. (2023). ggkegg: analysis and visualization of KEGG data utilizing the grammar of graphics. Bioinformatics, 39(10), btad622. https://doi.org/10.1093/bioinformatics/btad622.
- Kim, I., Choi, S., & Kim, S. (2018). BRCA-Pathway: a structural integration and visualization system of TCGA breast cancer data on KEGG pathways. BMC Bioinformatics, 19(Suppl 1), 42. https://doi.org/10.1186/s12859-018-2016-6.
- Pedersen, T.L., & others (2023). ggkegg: analysis and visualization of KEGG data utilizing the tidygraph framework for network analysis in R.Bioinformatics. https://doi.org/10.1093/bioinformatics/btac457.
- Minoru Kanehisa, Yoko Sato, Miho Furumichi, Kanae Morishima, Mao Tanabe (2019). New approach for understanding genome variations in KEGG, Nucleic Acids Research, Volume 47, Issue D1, 08 January, Pages D590–D595, https://doi.org/10.1093/nar/gky962
- Zhuang, F., Bai, X., Shi, Y., Chang, L., Ai, W., Du, J., … & Hong, T. (2022). Metabolomic profiling identifies biomarkers and metabolic impacts of surgery for colorectal cancer. Frontiers in Surgery, 9, 913967. https://doi.org/10.3389/fsurg.2022.913967
- Nishida, K., Ono, K., Kanaya, S., & Takahashi, K. (2014). KEGGscape: a Cytoscape app for pathway data integration. F1000Research, 3. doi: 10.12688/f1000research.4524.1
- Li, Z., Li, X., He, X., Jia, X., Zhang, X., Lu, B., … & Dong, Z. (2020). Proteomics reveal the inhibitory mechanism of levodopa against esophageal squamous cell carcinoma. Frontiers in pharmacology, 11, 568459. https://doi.org/10.3389/fphar.2020.568459