name: "archs4-database" description: "Query ARCHS4 REST API for uniformly processed RNA-seq expression, tissue patterns, co-expression across 1M+ human/mouse samples. Retrieve z-scores, co-expressed genes, samples by metadata, HDF5 matrices. For variant population genetics use gnomad-database; for pathway enrichment use gget-genomic-databases (Enrichr)." license: "CC-BY-4.0"
ARCHS4 Database
Overview
ARCHS4 (All RNA-seq and ChIP-seq Sample and Signature Search) is a resource of uniformly aligned and processed human and mouse RNA-seq data from NCBI GEO and SRA, covering 1 million+ samples. The REST API at https://maayanlab.cloud/archs4/api/ provides gene-level expression profiles, z-score normalized tissue expression, co-expression networks, and sample metadata search — all without authentication. Large-scale bulk queries can also use the downloadable HDF5 expression matrices.
When to Use
- Retrieving tissue-specific or cell-type-specific expression z-scores for a gene of interest across hundreds of tissue types
- Finding genes co-expressed with a query gene (co-expression network construction or guilt-by-association analysis)
- Searching for RNA-seq samples by tissue, disease, or metadata keyword to identify candidate datasets for reanalysis
- Comparing expression profiles of multiple genes across tissues to prioritize candidates for wet-lab follow-up
- Accessing uniformly processed gene expression matrices (HDF5 format) for large-scale cross-study analysis
- Validating differential expression results by checking whether a gene's expression direction matches population-level tissue profiles
- For variant-level population allele frequencies use
gnomad-database; ARCHS4 provides expression evidence only - For Enrichr pathway enrichment from a gene list use
gget-genomic-databases(gget enrichr); ARCHS4 is for expression lookups
Prerequisites
- Python packages:
requests,pandas,matplotlib,seaborn - Data requirements: gene symbols (HGNC format, e.g.,
TP53,BRCA1); sample GEO/SRA IDs for direct sample queries - Environment: internet connection; no API key or account required
- Rate limits: ~10 requests/second; add
time.sleep(0.1)between sequential gene queries to avoid throttling
pip install requests pandas matplotlib seaborn
Quick Start
import requests
ARCHS4_BASE = "https://maayanlab.cloud/archs4/api/v1"
def archs4_get(endpoint: str, params: dict = None) -> dict:
"""Send a GET request to the ARCHS4 API and return parsed JSON."""
r = requests.get(f"{ARCHS4_BASE}/{endpoint}", params=params, timeout=30)
r.raise_for_status()
return r.json()
# Quick check: top tissues expressing TP53
data = archs4_get("meta/genes/TP53/zscore")
tissues = data.get("values", [])
print(f"TP53 tissue expression entries: {len(tissues)}")
top5 = sorted(tissues, key=lambda x: x.get("zscore", 0), reverse=True)[:5]
for t in top5:
print(f" {t['tissue']:<40} z={t['zscore']:.2f}")
# TP53 tissue expression entries: 200
# thymus z=2.81
# testis z=2.44
Core API
Query 1: Gene Expression Z-Scores Across Tissues
Retrieve z-score normalized expression for a gene across all available tissue types. Z-scores are computed per-sample relative to the population distribution; positive values indicate above-average expression.
import requests
import pandas as pd
ARCHS4_BASE = "https://maayanlab.cloud/archs4/api/v1"
def get_gene_tissue_zscore(gene_symbol: str, species: str = "human") -> pd.DataFrame:
"""Return tissue z-score expression profile for a gene.
Parameters
----------
gene_symbol : str
HGNC gene symbol (e.g., 'TP53').
species : str
'human' or 'mouse' (default: 'human').
"""
endpoint = f"meta/genes/{gene_symbol}/zscore"
r = requests.get(
f"{ARCHS4_BASE}/{endpoint}",
params={"species": species},
timeout=30
)
r.raise_for_status()
data = r.json()
records = data.get("values", [])
df = pd.DataFrame(records)
return df.sort_values("zscore", ascending=False).reset_index(drop=True)
df = get_gene_tissue_zscore("MYC")
print(f"MYC tissue z-scores: {len(df)} tissue types")
print(df[["tissue", "zscore"]].head(10).to_string(index=False))
# MYC tissue z-scores: 200
# tissue zscore
# colon 3.12
# small intestine 2.98
# placenta 2.74
# Query mouse tissues for a gene
df_mouse = get_gene_tissue_zscore("Myc", species="mouse")
print(f"Mouse Myc: top 5 tissues")
print(df_mouse[["tissue", "zscore"]].head(5).to_string(index=False))
Query 2: Co-expressed Genes
Find genes whose expression is most correlated with a query gene across all ARCHS4 samples. Useful for identifying pathway partners, regulators, or candidate targets.
import requests
import pandas as pd
ARCHS4_BASE = "https://maayanlab.cloud/archs4/api/v1"
def get_coexpressed_genes(gene_symbol: str, top_n: int = 50,
species: str = "human") -> pd.DataFrame:
"""Return genes co-expressed with the query gene.
Parameters
----------
gene_symbol : str
HGNC gene symbol.
top_n : int
Number of correlated genes to return (default: 50).
species : str
'human' or 'mouse' (default: 'human').
"""
r = requests.get(
f"{ARCHS4_BASE}/meta/genes/{gene_symbol}/correlations",
params={"species": species, "limit": top_n},
timeout=30
)
r.raise_for_status()
data = r.json()
records = data.get("values", [])
df = pd.DataFrame(records)
return df.sort_values("correlation", ascending=False).reset_index(drop=True)
coexp = get_coexpressed_genes("PCNA", top_n=20)
print(f"Top co-expressed genes with PCNA (n={len(coexp)}):")
print(coexp[["gene", "correlation"]].head(10).to_string(index=False))
# Top co-expressed genes with PCNA (n=20):
# gene correlation
# RFC4 0.91
# RFC2 0.89
# MCM6 0.87
# Extract gene list for downstream enrichment
gene_list = coexp["gene"].tolist()
print(f"Co-expression gene list: {gene_list[:10]}")
# Pass gene_list to Enrichr or pathway analysis tools
Query 3: Sample Search
Search for RNA-seq samples by metadata keyword (tissue, disease condition, cell type, treatment). Returns GEO/SRA sample identifiers with metadata fields.
import requests
import pandas as pd
ARCHS4_BASE = "https://maayanlab.cloud/archs4/api/v1"
def search_samples(keyword: str, species: str = "human",
limit: int = 100) -> pd.DataFrame:
"""Search ARCHS4 samples by metadata keyword.
Parameters
----------
keyword : str
Search term (e.g., 'breast cancer', 'liver', 'HeLa').
species : str
'human' or 'mouse'.
limit : int
Maximum number of samples to return.
"""
r = requests.get(
f"{ARCHS4_BASE}/samples/search",
params={"query": keyword, "species": species, "limit": limit},
timeout=30
)
r.raise_for_status()
data = r.json()
records = data.get("samples", [])
return pd.DataFrame(records)
samples = search_samples("pancreatic cancer", limit=50)
print(f"Samples matching 'pancreatic cancer': {len(samples)}")
if len(samples) > 0:
print(samples[["sample_id", "series_id", "title"]].head(5).to_string(index=False))
# Samples matching 'pancreatic cancer': 50
# sample_id series_id title
# GSM2345678 GSE123456 Pancreatic ductal adenocarcinoma - sample 1
Query 4: Gene-Level Metadata Summary
Retrieve summary statistics and metadata for a gene including the number of samples expressing it, expression percentile, and available annotation.
import requests
ARCHS4_BASE = "https://maayanlab.cloud/archs4/api/v1"
def get_gene_metadata(gene_symbol: str, species: str = "human") -> dict:
"""Return metadata and expression summary for a gene."""
r = requests.get(
f"{ARCHS4_BASE}/meta/genes/{gene_symbol}",
params={"species": species},
timeout=30
)
r.raise_for_status()
return r.json()
meta = get_gene_metadata("GAPDH")
print(f"Gene: {meta.get('gene_symbol', 'N/A')}")
print(f"Species: {meta.get('species', 'N/A')}")
print(f"Ensembl ID: {meta.get('ensembl_gene_id', 'N/A')}")
print(f"Description: {meta.get('description', 'N/A')[:80]}")
# Compare metadata for a panel of housekeeping genes
import time
housekeeping = ["GAPDH", "ACTB", "B2M", "HPRT1", "RPLP0"]
for gene in housekeeping:
meta = get_gene_metadata(gene)
print(f" {gene:<8} {meta.get('ensembl_gene_id', 'N/A')}")
time.sleep(0.1)
Query 5: Visualization — Tissue Expression Barplot
Generate a publication-ready barplot of z-score expression across the top tissues for a gene.
import requests
import pandas as pd
import matplotlib.pyplot as plt
ARCHS4_BASE = "https://maayanlab.cloud/archs4/api/v1"
def plot_tissue_expression(gene_symbol: str, top_n: int = 20,
species: str = "human",
output_file: str = None) -> None:
"""Plot top tissue z-score expression for a gene.
Parameters
----------
gene_symbol : str
HGNC gene symbol.
top_n : int
Number of top tissues to display.
species : str
'human' or 'mouse'.
output_file : str
If provided, save figure to this path.
"""
r = requests.get(
f"{ARCHS4_BASE}/meta/genes/{gene_symbol}/zscore",
params={"species": species},
timeout=30
)
r.raise_for_status()
records = r.json().get("values", [])
df = pd.DataFrame(records).sort_values("zscore", ascending=False).head(top_n)
fig, ax = plt.subplots(figsize=(10, 6))
colors = ["#D73027" if z > 0 else "#4575B4" for z in df["zscore"]]
bars = ax.barh(df["tissue"][::-1], df["zscore"][::-1], color=colors[::-1])
ax.axvline(0, color="black", linewidth=0.8, linestyle="--")
ax.set_xlabel("Expression Z-Score")
ax.set_title(f"ARCHS4 Tissue Expression: {gene_symbol} ({species})\nTop {top_n} tissues")
ax.bar_label(bars, fmt="%.2f", padding=3, fontsize=8)
plt.tight_layout()
fname = output_file or f"{gene_symbol}_tissue_expression.png"
plt.savefig(fname, dpi=150, bbox_inches="tight")
print(f"Saved {fname} ({len(df)} tissues plotted)")
plot_tissue_expression("BRCA1", top_n=15, output_file="BRCA1_tissue_expression.png")
Query 6: HDF5 Bulk Data Access
Download or stream from ARCHS4's precomputed HDF5 expression matrices for large-scale cross-sample analysis. The HDF5 files contain gene × sample count matrices for human and mouse.
import requests
# HDF5 files are available for bulk download from the ARCHS4 data portal
# URL pattern: https://maayanlab.cloud/archs4/download#expression
# Human gene-level: human_gene_v2.6.h5
# Mouse gene-level: mouse_gene_v2.6.h5
def get_h5_download_urls() -> dict:
"""Return download URLs for ARCHS4 HDF5 expression matrices."""
base = "https://maayanlab.cloud/archs4"
return {
"human_gene": f"{base}/files/human_gene_v2.6.h5",
"mouse_gene": f"{base}/files/mouse_gene_v2.6.h5",
"human_transcript": f"{base}/files/human_transcript_v2.6.h5",
"mouse_transcript": f"{base}/files/mouse_transcript_v2.6.h5",
}
urls = get_h5_download_urls()
for key, url in urls.items():
print(f" {key:<22} {url}")
# To work with a downloaded HDF5 file:
try:
import h5py
import numpy as np
h5_path = "human_gene_v2.6.h5" # after download
def extract_gene_from_h5(h5_path: str, gene_symbol: str,
n_samples: int = 1000) -> dict:
"""Extract expression values for a gene from the HDF5 matrix."""
with h5py.File(h5_path, "r") as f:
genes = [g.decode() for g in f["meta"]["genes"]["gene_symbol"][:]]
if gene_symbol not in genes:
raise ValueError(f"{gene_symbol} not found in HDF5")
idx = genes.index(gene_symbol)
expr = f["data"]["expression"][idx, :n_samples]
sample_ids = [s.decode() for s in f["meta"]["samples"]["geo_accession"][:n_samples]]
return {"gene": gene_symbol, "expression": expr, "sample_ids": sample_ids}
result = extract_gene_from_h5(h5_path, "TP53", n_samples=500)
print(f"TP53 expression: mean={result['expression'].mean():.2f},"
f" max={result['expression'].max():.2f} (n={len(result['expression'])} samples)")
except ImportError:
print("h5py not installed. Install with: pip install h5py")
except FileNotFoundError:
print("HDF5 file not downloaded yet. Use the URLs above to download first.")
Key Concepts
Z-Score Normalization
ARCHS4 reports gene expression as z-scores computed relative to all samples for that gene. A z-score of 0 means expression at the population mean; a z-score of 2.0 means expression 2 standard deviations above the mean. Z-scores are more interpretable across datasets than raw counts because they account for library size differences and batch effects introduced by uniform alignment across studies.
# Example: Positive z-score = above-average expression for that gene
# z > 2.0 → top ~2.5% of samples for that gene
# z < -2.0 → bottom ~2.5% of samples for that gene
# Use absolute z-score thresholds consistently when comparing across genes
HDF5 vs REST API
| Access method | Best for | Limitations |
|---|---|---|
REST API (/zscore, /correlations) | Quick single-gene queries, exploration | Aggregated profiles only, no per-sample access |
REST API (/samples/search) | Discovering relevant datasets | Returns metadata, not expression values |
| HDF5 download | Bulk analysis, custom co-expression, ML | Requires 30–60 GB disk; download once |
Species and Gene Symbol Conventions
ARCHS4 indexes human samples using HGNC gene symbols (uppercase, e.g., TP53) and mouse samples using MGI symbols (first letter uppercase, e.g., Trp53). The species parameter accepts "human" or "mouse". Mixed-case or ensemble IDs will return empty results.
Common Workflows
Workflow 1: Multi-Gene Tissue Expression Heatmap
Goal: Compare tissue expression profiles of a gene panel and visualize as a heatmap to identify tissue-specific vs ubiquitous expression patterns.
import requests, time
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
ARCHS4_BASE = "https://maayanlab.cloud/archs4/api/v1"
gene_panel = ["MYC", "TP53", "BRCA1", "EGFR", "KRAS", "CDK4"]
top_n_tissues = 25
def get_tissue_zscores(gene: str) -> pd.Series:
r = requests.get(
f"{ARCHS4_BASE}/meta/genes/{gene}/zscore",
params={"species": "human"},
timeout=30
)
r.raise_for_status()
records = r.json().get("values", [])
df = pd.DataFrame(records).set_index("tissue")["zscore"]
return df
# Build expression matrix (genes × tissues)
all_data = {}
for gene in gene_panel:
try:
all_data[gene] = get_tissue_zscores(gene)
print(f" Fetched {gene}")
except Exception as e:
print(f" Warning: {gene} failed — {e}")
time.sleep(0.1)
matrix = pd.DataFrame(all_data).T # genes × tissues
# Select top tissues by max absolute z-score
tissue_importance = matrix.abs().max(axis=0).sort_values(ascending=False)
top_tissues = tissue_importance.head(top_n_tissues).index
matrix_subset = matrix[top_tissues]
# Plot heatmap
fig, ax = plt.subplots(figsize=(14, 5))
sns.heatmap(
matrix_subset,
cmap="RdBu_r",
center=0,
vmin=-3,
vmax=3,
ax=ax,
cbar_kws={"label": "Z-Score"},
linewidths=0.5
)
ax.set_title("ARCHS4 Tissue Expression Profiles — Gene Panel")
ax.set_xlabel("Tissue")
ax.set_ylabel("Gene")
plt.xticks(rotation=45, ha="right", fontsize=8)
plt.tight_layout()
plt.savefig("archs4_panel_heatmap.png", dpi=150, bbox_inches="tight")
print(f"Saved archs4_panel_heatmap.png ({matrix_subset.shape})")
Workflow 2: Co-expression Network Seed Expansion
Goal: Start from a seed gene, retrieve co-expressed partners, then query their co-expressed genes in turn to build a two-hop co-expression neighborhood.
import requests, time
import pandas as pd
ARCHS4_BASE = "https://maayanlab.cloud/archs4/api/v1"
def get_coexp(gene: str, top_n: int = 20, species: str = "human") -> list:
r = requests.get(
f"{ARCHS4_BASE}/meta/genes/{gene}/correlations",
params={"species": species, "limit": top_n},
timeout=30
)
r.raise_for_status()
return [rec["gene"] for rec in r.json().get("values", [])]
seed_gene = "PCNA"
min_correlation = 0.80
# Hop 1: direct co-expressed partners
hop1_genes = get_coexp(seed_gene, top_n=30)
print(f"Hop 1 partners of {seed_gene}: {len(hop1_genes)}")
time.sleep(0.1)
# Hop 2: co-expressed genes of each partner
edges = set()
for gene in hop1_genes[:10]: # limit for demonstration
partners = get_coexp(gene, top_n=20)
for partner in partners:
if partner != seed_gene:
edges.add((gene, partner))
time.sleep(0.1)
# Summarize the network
network_df = pd.DataFrame(list(edges), columns=["source", "target"])
hub_counts = network_df["source"].value_counts()
print(f"\nTwo-hop network: {len(edges)} edges")
print(f"Top hub genes:")
print(hub_counts.head(5))
network_df.to_csv(f"{seed_gene}_coexp_network.csv", index=False)
print(f"\nSaved {seed_gene}_coexp_network.csv")
Workflow 3: Sample Discovery and Dataset Summary
Goal: Search for samples by disease keyword, summarize how many GEO series are available, and export sample metadata for downstream reanalysis selection.
import requests, time
import pandas as pd
ARCHS4_BASE = "https://maayanlab.cloud/archs4/api/v1"
def search_and_summarize(keyword: str, species: str = "human",
limit: int = 200) -> pd.DataFrame:
"""Search samples and return a tidy metadata DataFrame."""
r = requests.get(
f"{ARCHS4_BASE}/samples/search",
params={"query": keyword, "species": species, "limit": limit},
timeout=30
)
r.raise_for_status()
records = r.json().get("samples", [])
return pd.DataFrame(records)
keyword = "colorectal cancer"
df = search_and_summarize(keyword, limit=150)
print(f"Samples matching '{keyword}': {len(df)}")
if len(df) > 0:
# Summarize by GEO series
series_counts = df["series_id"].value_counts()
print(f"\nTop GEO series (by sample count):")
print(series_counts.head(8).to_string())
# Export sample list
df.to_csv(f"{keyword.replace(' ', '_')}_samples.csv", index=False)
print(f"\nSaved {keyword.replace(' ', '_')}_samples.csv ({len(df)} samples)")
print(f"Unique GEO series: {df['series_id'].nunique()}")
Key Parameters
| Parameter | Endpoint | Default | Range / Options | Effect |
|---|---|---|---|---|
species | All gene endpoints | "human" | "human", "mouse" | Selects the species-specific sample index |
limit | /correlations, /samples/search | 100 | 1–500 | Number of results returned |
gene_symbol (path) | /meta/genes/{gene}/zscore, /correlations | — | HGNC symbol (human) or MGI symbol (mouse) | Query gene; case-sensitive |
query | /samples/search | — | free-text string | Metadata keyword search across title, tissue, source fields |
offset | /samples/search | 0 | integer | Pagination offset for large result sets |
correlation (response field) | /correlations | — | -1.0–1.0 | Pearson correlation coefficient; filter > 0.7 for high co-expression |
zscore (response field) | /zscore | — | continuous float | Expression z-score; > 2.0 = high expression |
page_size (HDF5) | HDF5 slice | all | any integer | Number of samples to extract per read from HDF5 |
Best Practices
-
Use z-score thresholds consistently: Because z-scores are gene-specific, a z-score of 2.0 for a ubiquitous gene (GAPDH) and a tissue-restricted gene (TTR, liver) have different interpretive meaning. Always annotate which gene you are comparing and the tissue background.
-
Sleep between batch queries: ARCHS4 enforces a soft rate limit of ~10 requests/second. Add
time.sleep(0.1)between sequential gene queries to avoid429 Too Many Requestserrors. -
Download HDF5 for large-scale analyses: For queries covering 50+ genes or requiring per-sample expression values, the REST API is impractical. Download the HDF5 file once and use
h5pyslicing for fast matrix access; this avoids hitting rate limits and is 100× faster for bulk extraction. -
Match gene symbol conventions by species: Human queries require HGNC uppercase symbols (e.g.,
TP53); mouse queries require MGI-style symbols (e.g.,Trp53). Using the wrong case returns empty results without an error. -
Validate co-expression findings across datasets: ARCHS4 co-expression aggregates across all tissue types. A high correlation may be driven by a single tissue or study. Cross-check with tissue-specific queries or manually inspect the top contributing GEO series.
Common Recipes
Recipe: Quick Tissue Specificity Check
When to use: Rapidly determine whether a gene is broadly expressed (housekeeping) or tissue-restricted before designing experiments.
import requests
ARCHS4_BASE = "https://maayanlab.cloud/archs4/api/v1"
def tissue_specificity_summary(gene_symbol: str) -> None:
"""Print a summary of high and low expression tissues for a gene."""
r = requests.get(
f"{ARCHS4_BASE}/meta/genes/{gene_symbol}/zscore",
params={"species": "human"},
timeout=30
)
r.raise_for_status()
records = r.json().get("values", [])
zscores = [rec["zscore"] for rec in records if rec.get("zscore") is not None]
top_high = sorted(records, key=lambda x: x.get("zscore", 0), reverse=True)[:5]
top_low = sorted(records, key=lambda x: x.get("zscore", float("inf")))[:3]
print(f"\n{gene_symbol} — {len(zscores)} tissues")
print(f" Range: [{min(zscores):.2f}, {max(zscores):.2f}] "
f"Mean: {sum(zscores)/len(zscores):.2f}")
print(" High expression:")
for t in top_high:
print(f" {t['tissue']:<35} z={t['zscore']:.2f}")
print(" Low expression:")
for t in top_low:
print(f" {t['tissue']:<35} z={t['zscore']:.2f}")
tissue_specificity_summary("TTR") # Transthyretin — liver-specific
Recipe: Batch Gene Co-Expression Table
When to use: Generate a pairwise correlation table for a gene panel from a list of differentially expressed genes.
import requests, time
import pandas as pd
ARCHS4_BASE = "https://maayanlab.cloud/archs4/api/v1"
def batch_coexpr_table(gene_list: list, top_n: int = 10) -> pd.DataFrame:
"""For each gene in gene_list, return its top co-expressed genes."""
rows = []
for gene in gene_list:
try:
r = requests.get(
f"{ARCHS4_BASE}/meta/genes/{gene}/correlations",
params={"species": "human", "limit": top_n},
timeout=30
)
r.raise_for_status()
for rec in r.json().get("values", []):
rows.append({
"query_gene": gene,
"coexp_gene": rec.get("gene"),
"correlation": rec.get("correlation"),
})
time.sleep(0.1)
except Exception as e:
print(f"Warning: {gene} skipped — {e}")
return pd.DataFrame(rows)
deg_list = ["MYC", "CCND1", "CDK4", "RB1", "E2F1"]
coexp_table = batch_coexpr_table(deg_list, top_n=10)
print(f"Co-expression entries: {len(coexp_table)}")
print(coexp_table.groupby("query_gene")["coexp_gene"].count())
coexp_table.to_csv("deg_coexpression_table.csv", index=False)
print("Saved deg_coexpression_table.csv")
Recipe: Export Sample IDs for GEO Download
When to use: Identify relevant GEO accessions to download raw count matrices for a meta-analysis.
import requests
import pandas as pd
ARCHS4_BASE = "https://maayanlab.cloud/archs4/api/v1"
keyword = "glioblastoma"
r = requests.get(
f"{ARCHS4_BASE}/samples/search",
params={"query": keyword, "species": "human", "limit": 200},
timeout=30
)
r.raise_for_status()
samples = pd.DataFrame(r.json().get("samples", []))
if len(samples) > 0:
# Get unique GEO series accessions
series = samples["series_id"].dropna().unique()
print(f"Unique GEO series for '{keyword}': {len(series)}")
for s in series[:10]:
n = (samples["series_id"] == s).sum()
print(f" {s} ({n} samples)")
# Export series list for GEO download script
pd.Series(series, name="geo_series").to_csv(
f"{keyword}_geo_series.txt", index=False
)
print(f"\nSaved {keyword}_geo_series.txt")
Troubleshooting
| Problem | Cause | Solution |
|---|---|---|
HTTP 404 for gene query | Gene symbol not found in ARCHS4 index | Verify HGNC symbol spelling; check species parameter matches gene convention (human: uppercase, mouse: first-letter-upper) |
HTTP 429 Too Many Requests | Exceeded ~10 req/s rate limit | Add time.sleep(0.1) between requests; for batch queries use a 0.5 s delay |
Empty values list in z-score response | Gene is not expressed in any indexed tissue, or wrong species | Switch species; verify gene is protein-coding and has GEO coverage |
Empty samples list from search | Keyword not matched in metadata fields | Try broader or alternative keywords (e.g., "liver" instead of "hepatic") |
| HDF5 gene not found | Symbol mismatch between HDF5 version and query | Check available genes in f["meta"]["genes"]["gene_symbol"][:]; try Ensembl ID or alias |
requests.exceptions.Timeout | Slow API response under load | Increase timeout=60; retry with exponential backoff |
| Z-scores all near zero | Gene has very low or absent expression across tissues | Check the gene's expression in raw counts; the gene may be non-coding or very lowly expressed |
Related Skills
gnomad-database— Population variant frequencies; use after ARCHS4 to identify variants in highly expressed genesgget-genomic-databases— Enrichr pathway enrichment for ARCHS4 co-expression gene lists (gget enrichr)pydeseq2-differential-expression— Differential expression analysis on bulk RNA-seq; ARCHS4 HDF5 matrices can serve as reference cohorts
References
- ARCHS4 web portal — Interactive expression browser and dataset download
- ARCHS4 REST API documentation — Endpoint reference and parameters
- Lachmann et al., Nature Communications 2018 — ARCHS4 original publication describing uniform alignment pipeline
- ARCHS4 GitHub — Source code and HDF5 schema documentation