name: kegg-database description: "KEGG REST API (academic only). Pathways, genes, compounds, enzymes, diseases, drugs via 7 ops (info/list/find/get/conv/link/ddi). ID conversion (NCBI/UniProt/PubChem). Use bioservices for multi-DB Python." license: Non-academic use of KEGG requires a commercial license
KEGG Database — Biological Pathway & Molecular Network Queries
Overview
KEGG (Kyoto Encyclopedia of Genes and Genomes) is a comprehensive bioinformatics resource for biological pathway analysis, molecular interaction networks, and cross-database ID conversion. Access is via a direct REST API with no authentication — all operations use simple HTTP GET requests returning tab-delimited text.
When to Use
- Mapping genes to biological pathways (e.g., "which pathways involve TP53?")
- Retrieving metabolic pathway details, gene lists, or compound structures
- Converting identifiers between KEGG, NCBI Gene, UniProt, and PubChem
- Checking drug-drug interactions from KEGG's pharmacological database
- Building pathway enrichment context (all genes per pathway for an organism)
- Cross-referencing compounds, reactions, enzymes, and pathways
- For Python-native multi-database queries (KEGG + UniProt + Ensembl in one script), prefer
bioservicesinstead - For pathway visualization, use KEGG Mapper (https://www.kegg.jp/kegg/mapper/) directly
Prerequisites
pip install requests
API constraints:
- Academic use only — commercial use requires a separate KEGG license
- Max 10 entries per
get/list/conv/link/ddicall (image/kgml/json: 1 entry only) - No explicit rate limit, but add
time.sleep(0.5)between batch requests to avoid server-side throttling - Base URL:
https://rest.kegg.jp/
Quick Start
import requests
import time
BASE = "https://rest.kegg.jp"
def kegg_get(operation, *args):
"""Generic KEGG REST API caller."""
url = f"{BASE}/{operation}/{'/'.join(args)}"
resp = requests.get(url)
resp.raise_for_status()
return resp.text
# Find pathways linked to human gene TP53
pathways = kegg_get("link", "pathway", "hsa:7157")
print(pathways[:200])
# hsa:7157 path:hsa04010
# hsa:7157 path:hsa04110
# ...
# Get pathway details
detail = kegg_get("get", "hsa04110")
print(detail[:300])
Core API
1. Database Information — kegg_info
Retrieve metadata and statistics about KEGG databases.
import requests
BASE = "https://rest.kegg.jp"
# Database-level info
info = requests.get(f"{BASE}/info/pathway").text
print(info[:200])
# pathway Pathway
# Release 112.0, Dec 2025
# Kanehisa Laboratories
# ...
# Organism-level info
hsa_info = requests.get(f"{BASE}/info/hsa").text
print(hsa_info[:200])
Common databases: kegg, pathway, module, brite, genes, genome, compound, glycan, reaction, enzyme, disease, drug
2. Listing Entries — kegg_list
List entry identifiers and names from any KEGG database.
import requests
BASE = "https://rest.kegg.jp"
# All human pathways
hsa_pathways = requests.get(f"{BASE}/list/pathway/hsa").text
for line in hsa_pathways.strip().split("\n")[:5]:
pathway_id, name = line.split("\t")
print(f"{pathway_id}: {name}")
# path:hsa00010: Glycolysis / Gluconeogenesis - Homo sapiens (human)
# ...
# Specific entries (max 10, joined with +)
genes = requests.get(f"{BASE}/list/hsa:10458+hsa:10459").text
print(genes)
Common organism codes: hsa (human), mmu (mouse), dme (fruit fly), sce (yeast), eco (E. coli)
3. Keyword Search — kegg_find
Search databases by keywords or molecular properties.
import requests
import time
BASE = "https://rest.kegg.jp"
# Keyword search in genes
results = requests.get(f"{BASE}/find/genes/p53").text
print(f"Found {len(results.strip().split(chr(10)))} entries")
time.sleep(0.5)
# Chemical formula search (exact match)
compounds = requests.get(f"{BASE}/find/compound/C7H10N4O2/formula").text
print(compounds[:200])
time.sleep(0.5)
# Molecular weight range search
drugs = requests.get(f"{BASE}/find/drug/300-310/exact_mass").text
print(drugs[:200])
Search options: append /formula (exact match), /exact_mass (range), /mol_weight (range) to compound/drug queries.
4. Entry Retrieval — kegg_get
Retrieve complete database entries or specific data formats.
import requests
import time
BASE = "https://rest.kegg.jp"
# Full pathway entry (text format)
pathway = requests.get(f"{BASE}/get/hsa00010").text
print(pathway[:500])
time.sleep(0.5)
# Multiple entries (max 10, joined with +)
genes = requests.get(f"{BASE}/get/hsa:10458+hsa:10459").text
# Protein sequence (FASTA)
fasta = requests.get(f"{BASE}/get/hsa:10458/aaseq").text
print(fasta[:200])
time.sleep(0.5)
# Compound structure (MOL format)
mol = requests.get(f"{BASE}/get/cpd:C00002/mol").text # ATP
# Pathway image (PNG, single entry only)
img_resp = requests.get(f"{BASE}/get/hsa05130/image")
with open("pathway.png", "wb") as f:
f.write(img_resp.content)
print(f"Saved pathway image: {len(img_resp.content)} bytes")
Output formats: aaseq (protein FASTA), ntseq (nucleotide FASTA), mol (MOL), kcf (KCF), image (PNG), kgml (XML), json (pathway JSON). Image/KGML/JSON accept one entry only.
5. ID Conversion — kegg_conv
Convert identifiers between KEGG and external databases.
import requests
import time
BASE = "https://rest.kegg.jp"
# KEGG gene → NCBI Gene ID (specific gene)
ncbi = requests.get(f"{BASE}/conv/ncbi-geneid/hsa:10458").text
print(ncbi.strip())
# hsa:10458 ncbi-geneid:10458
time.sleep(0.5)
# KEGG gene → UniProt
uniprot = requests.get(f"{BASE}/conv/uniprot/hsa:10458").text
print(uniprot.strip())
time.sleep(0.5)
# Bulk conversion: all human genes → NCBI Gene IDs
all_conv = requests.get(f"{BASE}/conv/ncbi-geneid/hsa").text
lines = all_conv.strip().split("\n")
print(f"Total conversions: {len(lines)}")
# Reverse: NCBI Gene ID → KEGG
reverse = requests.get(f"{BASE}/conv/hsa/ncbi-geneid:7157").text
print(reverse.strip()) # TP53
Supported external databases: ncbi-geneid, ncbi-proteinid, uniprot, pubchem, chebi
6. Cross-Referencing — kegg_link
Find related entries within and between KEGG databases.
import requests
import time
BASE = "https://rest.kegg.jp"
# Genes in glycolysis pathway
genes = requests.get(f"{BASE}/link/genes/hsa00010").text
gene_list = [line.split("\t")[1] for line in genes.strip().split("\n") if line]
print(f"Glycolysis genes: {len(gene_list)}")
time.sleep(0.5)
# Pathways containing a specific gene
pathways = requests.get(f"{BASE}/link/pathway/hsa:7157").text # TP53
print(pathways[:300])
time.sleep(0.5)
# Compounds in a pathway
compounds = requests.get(f"{BASE}/link/compound/hsa00010").text
print(f"Compounds in glycolysis: {len(compounds.strip().split(chr(10)))}")
# Map genes to KO (orthology) groups
ko = requests.get(f"{BASE}/link/ko/hsa:10458").text
print(ko.strip())
Common links: genes ↔ pathway, pathway ↔ compound, pathway ↔ enzyme, genes ↔ ko (orthology)
7. Drug-Drug Interactions — kegg_ddi
Check pharmacological interactions between drugs.
import requests
BASE = "https://rest.kegg.jp"
# Single drug — all known interactions
interactions = requests.get(f"{BASE}/ddi/D00001").text
print(f"Interactions: {len(interactions.strip().split(chr(10)))}")
# Pairwise check (max 10 drugs, joined with +)
pair = requests.get(f"{BASE}/ddi/D00001+D00002+D00003").text
print(pair[:300])
Key Concepts
Identifier Formats
| Type | Format | Example |
|---|---|---|
| Reference pathway | map##### | map00010 (Glycolysis, generic) |
| Organism pathway | {org}##### | hsa00010 (Glycolysis, human) |
| Gene | {org}:{number} | hsa:7157 (TP53) |
| Compound | cpd:C##### | cpd:C00002 (ATP) |
| Drug | dr:D##### | dr:D00001 |
| Enzyme | ec:{EC_number} | ec:1.1.1.1 |
| KO (orthology) | ko:K##### | ko:K00001 |
Pathway Categories
KEGG organizes pathways into seven major categories:
- Metabolism —
map001xx(Glycolysis, TCA cycle, amino acid metabolism) - Genetic Information Processing —
map030xx(Ribosome, Spliceosome, DNA repair) - Environmental Information Processing —
map040xx(MAPK signaling, ABC transporters) - Cellular Processes —
map041xx(Autophagy, Apoptosis, Cell cycle) - Organismal Systems —
map046xx(Immune, Endocrine, Nervous) - Human Diseases —
map052xx(Cancer, Neurodegenerative, Infectious) - Drug Development — Chronological and target-based classifications
Common Workflows
Workflow: Gene to Pathway Mapping
Find all pathways associated with a gene of interest.
import requests
import time
BASE = "https://rest.kegg.jp"
# Step 1: Find gene by keyword
results = requests.get(f"{BASE}/find/genes/BRCA1+homo+sapiens").text
print("Gene search results:")
for line in results.strip().split("\n")[:5]:
print(f" {line}")
time.sleep(0.5)
# Step 2: Get pathways linked to BRCA1
pathways = requests.get(f"{BASE}/link/pathway/hsa:672").text
pathway_ids = [line.split("\t")[1].replace("path:", "") for line in pathways.strip().split("\n") if line]
print(f"\nBRCA1 is in {len(pathway_ids)} pathways:")
time.sleep(0.5)
# Step 3: Get pathway names
for pid in pathway_ids[:5]:
info = requests.get(f"{BASE}/get/{pid}").text
# Extract NAME field
for line in info.split("\n"):
if line.startswith("NAME"):
print(f" {pid}: {line.replace('NAME', '').strip()}")
break
time.sleep(0.5)
Workflow: Pathway Enrichment Context
Build a gene-set collection for all pathways of an organism.
import requests
import time
BASE = "https://rest.kegg.jp"
# Step 1: List all human pathways
pathways_text = requests.get(f"{BASE}/list/pathway/hsa").text
pathways = {}
for line in pathways_text.strip().split("\n"):
pid, name = line.split("\t", 1)
pathways[pid.replace("path:", "")] = name
print(f"Total human pathways: {len(pathways)}")
time.sleep(0.5)
# Step 2: Get genes for each pathway (sample first 3 for demo)
gene_sets = {}
for pid in list(pathways.keys())[:3]:
genes_text = requests.get(f"{BASE}/link/genes/{pid}").text
gene_ids = [line.split("\t")[1] for line in genes_text.strip().split("\n") if line]
gene_sets[pid] = gene_ids
print(f" {pid}: {len(gene_ids)} genes")
time.sleep(0.5)
# Step 3: Convert to NCBI Gene IDs for enrichment tools
# (use kegg_conv for bulk conversion)
Workflow: Compound-Pathway-Reaction Analysis
Trace a compound through metabolic reactions and pathways.
import requests
import time
BASE = "https://rest.kegg.jp"
# Step 1: Search for compound
results = requests.get(f"{BASE}/find/compound/glucose").text
print("Compound search:")
for line in results.strip().split("\n")[:3]:
print(f" {line}")
time.sleep(0.5)
# Step 2: Find reactions involving glucose (C00031)
reactions = requests.get(f"{BASE}/link/reaction/cpd:C00031").text
rxn_ids = [line.split("\t")[1] for line in reactions.strip().split("\n") if line]
print(f"\nReactions involving glucose: {len(rxn_ids)}")
time.sleep(0.5)
# Step 3: Find pathways for a specific reaction
pathways = requests.get(f"{BASE}/link/pathway/rn:R00299").text
print(f"\nPathways for R00299:")
print(pathways[:300])
time.sleep(0.5)
# Step 4: Get pathway detail
detail = requests.get(f"{BASE}/get/map00010").text
print(f"\nGlycolysis pathway detail (first 500 chars):")
print(detail[:500])
Workflow: Cross-Database ID Integration
Map KEGG identifiers to UniProt, NCBI, and PubChem for multi-database workflows.
import requests
import time
BASE = "https://rest.kegg.jp"
# Step 1: Convert gene to multiple external IDs
gene = "hsa:7157" # TP53
uniprot = requests.get(f"{BASE}/conv/uniprot/{gene}").text.strip()
print(f"UniProt: {uniprot}")
time.sleep(0.5)
ncbi = requests.get(f"{BASE}/conv/ncbi-geneid/{gene}").text.strip()
print(f"NCBI Gene: {ncbi}")
time.sleep(0.5)
# Step 2: Get protein sequence from KEGG
fasta = requests.get(f"{BASE}/get/{gene}/aaseq").text
print(f"\nProtein sequence (first 200 chars):\n{fasta[:200]}")
time.sleep(0.5)
# Step 3: Convert compounds to PubChem CIDs
cpd_conv = requests.get(f"{BASE}/conv/pubchem/cpd:C00002").text.strip() # ATP
print(f"\nATP PubChem: {cpd_conv}")
Key Parameters
| Parameter | Function/Endpoint | Default | Options | Effect |
|---|---|---|---|---|
organism | list, link, conv | None | 3-4 letter code | Filter by organism (e.g., hsa, mmu) |
option | find | None | formula, exact_mass, mol_weight | Search mode for compounds/drugs |
format | get | text | aaseq, ntseq, mol, kcf, image, kgml, json | Output format |
+ separator | get, list, ddi | — | Max 10 entries | Batch query (join IDs with +) |
target_db | conv | — | ncbi-geneid, uniprot, pubchem, chebi | External database for ID conversion |
target_db | link | — | pathway, genes, compound, ko, enzyme | Related KEGG database |
Best Practices
-
Add delays between batch requests: No explicit rate limit, but
time.sleep(0.5)between requests prevents throttling and is courteous to the shared academic resource. -
Anti-pattern — fetching all entries without filtering: Use
kegg_listto enumerate IDs first, thenkegg_getfor specific entries. Avoid downloading entire databases when you need a subset. -
Parse tab-delimited output consistently: All KEGG responses use
\tas field separator and\nas record separator. Always.strip()before splitting. -
Respect the 10-entry batch limit:
kegg_get,kegg_list,kegg_conv,kegg_link,kegg_ddiaccept max 10 entries (joined with+). Image/KGML/JSON formats accept only 1. -
Use organism-specific pathway IDs:
hsa00010(human glycolysis) returns organism-specific gene mappings;map00010(reference) returns generic entries. Always prefer organism-specific when analyzing a known organism. -
Cache frequently-used conversions: Full organism ID conversions (
kegg_conv('ncbi-geneid', 'hsa')) return large results. Cache locally rather than repeating.
Common Recipes
Recipe: Parse KEGG Flat-File Entry
def parse_kegg_entry(text):
"""Parse a KEGG flat-file entry into a dictionary."""
entry = {}
current_key = None
for line in text.split("\n"):
if line.startswith("///"):
break
if line[:12].strip(): # New field
current_key = line[:12].strip()
entry[current_key] = line[12:].strip()
elif current_key: # Continuation
entry[current_key] += "\n" + line[12:].strip()
return entry
import requests
pathway = requests.get("https://rest.kegg.jp/get/hsa00010").text
parsed = parse_kegg_entry(pathway)
print(f"Name: {parsed.get('NAME', 'N/A')}")
print(f"Description: {parsed.get('DESCRIPTION', 'N/A')[:200]}")
Recipe: Organism Comparison
import requests
import time
BASE = "https://rest.kegg.jp"
organisms = {"hsa": "Human", "mmu": "Mouse", "sce": "Yeast"}
pathway = "00010" # Glycolysis
for org, name in organisms.items():
genes = requests.get(f"{BASE}/link/genes/{org}{pathway}").text
count = len([l for l in genes.strip().split("\n") if l])
print(f"{name} ({org}): {count} genes in Glycolysis")
time.sleep(0.5)
# Human (hsa): 68 genes in Glycolysis
# Mouse (mmu): 67 genes in Glycolysis
# Yeast (sce): 31 genes in Glycolysis
Recipe: Build Gene-to-Pathway Mapping Table
import requests
import time
BASE = "https://rest.kegg.jp"
# Get all human gene-pathway links
links = requests.get(f"{BASE}/link/pathway/hsa").text
gene_pathways = {}
for line in links.strip().split("\n"):
if not line:
continue
gene, pathway = line.split("\t")
gene_pathways.setdefault(gene, []).append(pathway.replace("path:", ""))
print(f"Genes with pathway annotations: {len(gene_pathways)}")
# Show top genes by pathway count
top = sorted(gene_pathways.items(), key=lambda x: -len(x[1]))[:5]
for gene, paths in top:
print(f" {gene}: {len(paths)} pathways")
Troubleshooting
| Problem | Cause | Solution |
|---|---|---|
404 Not Found | Entry or database doesn't exist | Verify ID format and organism code; use kegg_list to check valid IDs |
400 Bad Request | Malformed API URL | Check URL path: /{operation}/{arg1}/{arg2}; no query params |
| Empty response | Search term too specific or no matches | Broaden keywords; try partial matches; check organism code |
| Image/KGML returns error | Batch query with image/kgml/json format | These formats accept one entry only — remove + joins |
403 Forbidden | Server-side rate limiting | Add time.sleep(1) between requests; reduce batch frequency |
| Wrong gene IDs returned | Using reference pathway (map) instead of organism-specific | Use organism prefix: hsa00010 not map00010 for gene links |
| ID conversion returns empty | External DB doesn't cover that entry | Not all KEGG entries have UniProt/NCBI mappings; check with kegg_list first |
| Response encoding issues | Non-ASCII characters in compound names | Use resp.encoding = 'utf-8' or resp.text (requests auto-detects) |
Related Skills
- gget-genomic-databases — unified Python interface to Ensembl, NCBI, UniProt; use for gene-level queries when KEGG pathway context isn't needed
- biopython-molecular-biology — BioPython's
Bio.KEGGmodule provides an alternative Python API for KEGG parsing - pubchem-compound-search — for compound property lookups beyond KEGG's structural data; use
kegg_conv('pubchem', ...)to bridge IDs
References
- KEGG REST API documentation — official API specification
- KEGG website — pathway browser, KEGG Mapper, BlastKOALA
- KEGG organism codes — full list of 3-4 letter organism codes
- Kanehisa, M. et al. (2023) "KEGG for taxonomy-based analysis of pathways and genomes" Nucleic Acids Research 51:D483-D489