name: locus-to-gene-mapper-skill description: Map GWAS loci to ranked candidate genes using a deterministic multi-skill chain (EFO -> GWAS -> coordinates -> Open Targets L2G/coloc -> eQTL -> burden/coding context), with reproducible tables and optional figures. Use when a user provides a trait/EFO term and/or lead variants and needs locus-to-gene prioritization for downstream biology decisions.
Locus-to-Gene Mapper
Generate a reproducible locus-to-gene mapping for one trait (or a seed set of lead variants), with explicit evidence attribution and conservative confidence labels.
This skill is optimized for bioinformaticians who need executable, traceable mapping from variant signals to plausible causal genes.
Required Inputs
Provide at least one anchor source:
trait_query(string), for examplechronic obstructive pulmonary diseaseefo_id(string), for exampleEFO_0000341seed_rsids(list[string]), for example["rs1873625", "rs7903146"]
Optional Inputs
target_gene(string), optional gene of interest for highlighting in outputshow_child_traits(bool), defaulttruephenotype_terms(list[string]), optional additional terms to include when finding anchorsmax_anchor_associations(int), default1200max_loci(int), default25max_genes_per_locus(int), default10max_coloc_rows_per_locus(int), default100max_eqtl_rows_per_variant(int), default200genebass_burden_sets(list[string]), default["pLoF", "missense|LC"]include_clinvar(bool), defaulttrueinclude_gnomad_context(bool), defaulttrueinclude_hpa_tissue_context(bool), defaulttrueinclude_figures(bool), defaultfalsedisable_default_seeds(bool), defaultfalse; iffalse, common traits automatically get built-in seed rsIDsfigure_output_dir(string), default./output/figuresmapping_output_path(string), default./output/locus_to_gene_mapping.jsonsummary_output_path(string), default./output/locus_to_gene_summary.md
Runtime Requirements
- Python
3.11+ requests- Optional for figure generation:
matplotlib,seaborn,pandas
Bundled Script (Deterministic Runner)
- Primary entrypoint:
scripts/map_locus_to_gene.py - This script:
- resolves trait/EFO and anchor variants,
- gathers locus-to-gene evidence through the chained skills,
- writes mapping JSON and summary markdown,
- optionally renders figures when plotting deps are available.
Run:
python locus-to-gene-mapper-skill/scripts/map_locus_to_gene.py \
--input-json /path/to/input.json \
--print-result
Quick start (no input JSON file):
python locus-to-gene-mapper-skill/scripts/map_locus_to_gene.py \
--trait-query "type 2 diabetes" \
--print-result
Trait-only runs default to include_figures=true unless explicitly disabled with --no-include-figures.
Minimal input JSON:
{
"trait_query": "type 2 diabetes"
}
Built-in default seeds (when disable_default_seeds=false):
type 2 diabetes/t2d->rs7903146,rs13266634,rs7756992,rs5219,rs1801282,rs4402960coronary artery disease/cad->rs1333049,rs4977574,rs9349379,rs6725887,rs1746048,rs3184504body mass index/bmi->rs9939609,rs17782313,rs6548238,rs10938397,rs7498665,rs7138803asthma->rs7216389,rs2305480,rs9273349rheumatoid arthritis->rs2476601,rs3761847,rs660895alzheimer disease->rs429358,rs7412,rs6733839,rs11136000,rs3851179ldl cholesterol/total cholesterol->rs7412,rs429358,rs6511720,rs629301,rs12740374,rs11591147
Autonomous Execution Contract (Embedded Behavior)
When a user asks for locus-to-gene mapping and gives only a trait (for example, type 2 diabetes), do the following automatically:
- Run the bundled script with
--trait-query "<user_trait>" --print-result(no manual JSON required). - If it returns
No anchors remained, rerun once with a built-in default seed rsID for that trait (unlessdisable_default_seeds=true). - Read the generated
mapping_output_pathandsummary_output_path. - Return this concise response structure:
Top 5 cross-locus prioritized genesPer-locus top gene (score, confidence)Visualization artifact(figure path(s) or Mermaid fallback block)Warnings and limitations
- For inline image rendering in chat:
- read
inline_image_markdownfrom script result - emit those lines exactly as plain markdown (no code fences)
- if inline rendering still fails, instruct user to upload PNG files into the chat
- read
Do not ask the user to run python manually unless execution is actually blocked.
Skill Chaining Order (Mandatory)
Use these skills in order. Skip only when an earlier step is not needed by provided inputs.
efo-ontology-skill- Resolve
trait_queryto canonical EFO term and synonyms. - Expand descendants when
show_child_traits=true.
- Resolve
gwas-catalog-skill- Discover anchor variants for the trait/EFO scope.
- Pull association/study metadata for locus context.
variant-coordinate-finder-skill- Normalize each anchor to rsID plus GRCh37/GRCh38 coordinates.
opentargets-skill- Retrieve credible set context, L2G predictions, and colocalisation evidence per locus.
gtex-eqtl-skill- Retrieve single-tissue eQTL support for anchor variants.
genebass-gene-burden-skill- Retrieve rare-variant burden support for candidate genes.
clinvar-variation-skill(wheninclude_clinvar=true)- Add variant clinical/coding annotations.
gnomad-graphql-skill(wheninclude_gnomad_context=true)- Add frequency and gene-level constraint context.
human-protein-atlas-skill(wheninclude_hpa_tissue_context=true)- Add tissue plausibility context for top genes.
Never perform additional retrieval after final candidate-gene scoring starts.
Output Contract (Required)
Always return:
locus_to_gene_mapping.jsonlocus_to_gene_summary.md
JSON contract
{
"meta": {
"trait_query": "...",
"efo_id": "EFO_...",
"generated_at": "ISO-8601",
"sources_queried": []
},
"anchors": [
{
"rsid": "rs...",
"grch38": {"chr": "3", "pos": 49629531, "ref": "A", "alt": "C"},
"lead_trait": "...",
"p_value": 2e-11,
"cohort": "..."
}
],
"loci": [
{
"locus_id": "chr3:49000000-50200000",
"lead_rsid": "rs...",
"candidate_genes": [
{
"symbol": "MST1",
"ensembl_id": "ENSG...",
"overall_score": 0.71,
"confidence": "High|Medium|Low|VeryLow",
"evidence": {
"l2g_max": 0.83,
"coloc_max_h4": 0.84,
"eqtl_tissues": ["Lung"],
"rare_variant_support": "none|nominal|strong",
"coding_support": "none|noncoding|coding",
"clinvar_support": "none|present",
"gnomad_context": "...",
"hpa_tissue_support": ["lung"]
},
"rationale": [
"..."
],
"limitations": [
"..."
]
}
]
}
],
"cross_locus_ranked_genes": [
{
"symbol": "...",
"supporting_loci": 3,
"mean_score": 0.62,
"max_score": 0.81
}
],
"warnings": [],
"limitations": []
}
Markdown summary contract
The summary must include sections in this exact order:
ObjectiveInputs and scopeAnchor variant summaryPer-locus top genesCross-locus prioritized genesKey caveatsRecommended next analyses
Optional Figure Contract
Only produce figures when include_figures=true.
If figures are generated, append this block to JSON:
{
"figures": [
{
"id": "locus_gene_heatmap",
"path": "./output/figures/locus_gene_heatmap.png",
"caption": "Top candidate genes by evidence component across loci"
}
]
}
Recommended figure set:
locus_gene_heatmap.png- Rows: top genes, columns: evidence components (
L2G,coloc,eQTL,burden,coding).
- Rows: top genes, columns: evidence components (
locus_score_decomposition.png- Stacked bars per locus for top 3 genes.
tissue_support_dotplot.png- Gene-by-tissue evidence dots from GTEx/HPA context.
If plotting dependencies are unavailable, skip PNG generation and output Mermaid diagrams in markdown as fallback.
The script also returns inline_image_markdown and render_instructions fields to support inline chat rendering.
Scoring Rules (Deterministic)
For each candidate gene per locus, compute:
l2g_component: max L2G score for the gene in locus (0..1)coloc_component: maxh4(orclppwhen only CLPP is available), clipped to0..1eqtl_component:min(1, relevant_tissue_hits / 3)burden_component:1.0if burdenp < 2.5e-60.6if2.5e-6 <= p < 0.050.0otherwise
coding_component:1.0for coding consequence in target gene with supportive ClinVar annotation0.6for coding consequence in target gene without supportive ClinVar annotation0.3for noncoding-in-gene support only0.0otherwise
Overall score:
overall_score = 0.40*l2g + 0.25*coloc + 0.15*eqtl + 0.10*burden + 0.10*coding
Confidence label:
Highif score>= 0.75Mediumif0.55 <= score < 0.75Lowif0.35 <= score < 0.55VeryLowif score< 0.35
Pipeline Contract
Phase 0: Validate and normalize input
- Enforce that at least one of
trait_query,efo_id,seed_rsidsis present. - Normalize rsID formatting and deduplicate seed variants.
- Resolve free-text trait to one canonical EFO term when needed.
Phase 1: Build anchor set
- If trait/EFO input is provided, pull associations and rank anchors by p-value and effect availability.
- Merge trait-derived anchors with user-supplied
seed_rsids. - Cap anchors using
max_lociand log dropped anchors inwarnings.
Phase 2: Gather locus-to-gene evidence
- Normalize anchor coordinates (both builds when possible).
- Pull Open Targets locus evidence (credible set/L2G/coloc).
- Pull GTEx variant-level eQTL rows.
- Pull gene-level burden results for mapped candidate genes.
- Pull ClinVar and gnomAD context when enabled.
Phase 3: Harmonize and score
- Build a per-locus candidate-gene table.
- Compute deterministic component scores and overall score.
- Create cross-locus aggregate rankings.
Phase 4: Synthesize outputs
- Write JSON mapping file.
- Write markdown summary in exact section order.
- Optionally generate figures and append
figuresmetadata.
Phase 5: QC gates
Fail the run when any of the following occurs:
- No anchors after normalization.
- Any locus has candidate genes without score fields.
overall_scoreoutside0..1.- Summary section order mismatch.
- Claim of causality without explicit evidence support in rationale text.
Public Interface
def map_locus_to_gene(input_json: dict) -> dict:
...
Return:
{
"status": "ok",
"mapping_output_path": "./output/locus_to_gene_mapping.json",
"summary_output_path": "./output/locus_to_gene_summary.md",
"figure_paths": [],
"warnings": []
}
Non-Invention Rules
- Never invent rsIDs, p-values, scores, cohort labels, tissues, or gene links.
- Never silently impute missing evidence as positive support.
- When evidence is missing, record it as a limitation and reduce confidence.
- Keep evidence provenance explicit (
source skill+ endpoint family) in rationale lines.
Non-Goals
- Do not claim definitive causal genes from association evidence alone.
- Do not run fine-mapping methods not directly provided by upstream sources.
- Do not collapse multiple independent signals into one without stating assumptions.