name: "pubmed-database" description: "PubMed Database workflow skill. Use this skill when the user needs direct REST API access to PubMed. Advanced Boolean and MeSH queries, E-utilities API, batch processing, and citation-oriented retrieval. For Python workflows, prefer Biopython (Bio.Entrez). Use this skill for direct HTTP/REST work or custom API implementations, and preserve upstream workflow intent, copied support files, and provenance before handoff." version: "0.0.1" category: "backend" tags:
- "pubmed-database"
- "direct"
- "rest"
- "api"
- "access"
- "pubmed"
- "advanced"
- "boolean"
- "omni-enhanced" complexity: "advanced" risk: "caution" tools:
- "codex-cli"
- "claude-code"
- "cursor"
- "gemini-cli"
- "opencode" source: "omni-team" author: "Omni Skills Team" date_added: "2026-04-15" date_updated: "2026-04-19" source_type: "omni-curated" maintainer: "Omni Skills Team" family_id: "pubmed-database" family_name: "PubMed Database" variant_id: "omni" variant_label: "Omni Curated" is_default_variant: true derived_from: "skills/pubmed-database" upstream_skill: "skills/pubmed-database" upstream_author: "sickn33" upstream_source: "community" upstream_pr: "79" upstream_head_repo: "diegosouzapw/awesome-omni-skills" upstream_head_sha: "6bf093920a93e68fa8263cf6ee767d7407989d56" curation_surface: "skills_omni" enhanced_origin: "omni-skills-private" source_repo: "diegosouzapw/awesome-omni-skills" replaces:
- "pubmed-database"
PubMed Database
Overview
This skill supports direct, official programmatic access to PubMed through NCBI Entrez E-utilities.
Use it when the task needs reproducible biomedical literature retrieval, fielded or MeSH-aware searching, batch export, citation-oriented extraction, or a custom integration that should stay aligned with official PubMed and NCBI behavior.
Keep the original upstream intent intact: this skill exists for direct REST access and custom workflows. For Python implementations, prefer Bio.Entrez as a client wrapper, but design and verify the workflow in terms of the underlying E-utilities semantics first.
Do not fall back to scraping PubMed HTML pages when E-utilities already expose the needed data.
When to Use This Skill
Activate this skill when one or more of these are true:
- You need direct HTTP access to PubMed or Entrez E-utilities.
- You need a repeatable, auditable search strategy rather than an ad hoc UI search.
- You must construct advanced Boolean, field-tagged, date-bounded, publication-type, or MeSH-informed queries.
- You need to retrieve many records in batches without manually copying PMIDs.
- You need to compare ESearch, ESummary, EFetch, and ELink for the same workflow.
- You are building a custom integration for citation metadata, abstracts, identifiers, or related-record lookup.
- You must verify how PubMed interpreted a query before exporting or analyzing results.
Do not use this skill as the first choice when:
- The user only needs a quick manual literature search in the PubMed web UI.
- The task is purely Python automation and a higher-level client already covers the needed behavior; in that case, still use this skill for query design and API semantics, but implement with
Bio.Entrez. - The task requires unsupported data collection patterns such as HTML scraping or aggressive harvesting.
Operating Table
| Situation | Start here | Why it matters |
|---|---|---|
| Choosing the right E-utility | references/integration-patterns.md | Helps decide between ESearch, ESummary, EFetch, and ELink before building requests |
| Designing a reproducible query | examples/request-response-example.md | Shows fielded search, translation checks, and history-server usage with concrete request patterns |
| Mapping output fields for extraction | assets/schema-map.json | Gives a compact machine-readable map for common citation, abstract, journal, and identifier extraction goals |
| First production call | This SKILL.md | Establishes safe request structure, identification, batching, and troubleshooting |
| Python implementation | This SKILL.md, then examples/request-response-example.md | Keeps REST semantics primary, then shows a Bio.Entrez equivalent without changing policy obligations |
Workflow
1. Define the retrieval target
Clarify all of the following before making requests:
- Research question or operational objective
- Concepts, synonyms, abbreviations, and likely spelling variants
- Required filters such as date range, language, species, publication type, or journal
- Output need: counts only, lightweight summaries, full structured records, or related links
- Expected volume: a few records, hundreds, or a large result set needing pagination and checkpointing
For evidence-sensitive work such as systematic review support, combine controlled vocabulary and free-text terms deliberately instead of assuming one will fully cover the concept.
2. Build the query explicitly
Construct the query with field tags and Boolean logic instead of relying on vague free text.
Common patterns include:
- Title/abstract terms for recent phrasing:
term[Title/Abstract] - Author lookup:
Surname Initials[Author] - Journal restriction:
Journal Name[Journal] - Publication type:
randomized controlled trial[Publication Type] - Date restrictions with Entrez date parameters or explicit query clauses
- MeSH-driven concept expansion, often paired with free-text synonyms
Good practice:
- Quote phrases only when you want a phrase-level constraint.
- Use parentheses around concept groups.
- Keep a logged copy of the exact submitted query string.
- For recall-sensitive searches, pair MeSH with keyword synonyms rather than treating them as interchangeable.
3. Run ESearch first and inspect interpretation
Use ESearch to verify whether PubMed interpreted the query as intended.
Minimum concerns to verify:
Count- Returned identifiers for a small first page
- Search interpretation or translation details when available
- Whether the query is unexpectedly broad or narrow
Do this before launching large exports.
If the result set is non-trivial, prefer usehistory=y so downstream calls can reference WebEnv and query_key instead of copying large PMID lists through every step.
4. Decide the downstream retrieval utility
Choose the next step based on the actual output need:
- ESearch: find PMIDs, counts, and search interpretation
- ESummary: lightweight metadata review, screening support, fast record summaries
- EFetch: richer record retrieval for structured extraction, abstracts, identifiers, and detailed citation fields
- ELink: related-record, citation-link, or cross-database relationships when available
Do not assume ESummary and EFetch contain the same fields.
5. Batch safely for larger result sets
For larger jobs:
- Call
ESearchwithusehistory=y - Capture and log
Count,WebEnv, andquery_key - Page through records with
retstartandretmax - Retrieve with
ESummaryorEFetchin bounded batches - Log progress after each batch
- Check cumulative retrieved records against expected count
Operational guardrails:
- Use respectful pacing and bounded retries.
- Provide identifying request metadata such as tool and email as required by NCBI guidance.
- If using an API key, configure it explicitly rather than assuming higher throughput automatically applies.
- Do not make a single oversized request when a history-backed paginated workflow is safer.
- Checkpoint enough state to resume after interruption.
Recommended audit fields per batch:
- Query string
- Utility used
retstartretmax- Cumulative records written
WebEnvandquery_keywhen using the history server- Timestamp and any retry events
6. Prefer machine-parseable formats for extraction
When building parsers or downstream transforms:
- Prefer structured formats such as XML when field reliability matters.
- Use
ESummaryonly for summary-oriented metadata needs. - Use
EFetchwhen you need richer record content. - Validate the requested
retmodeandrettypeagainst the utility and extraction goal.
Use assets/schema-map.json as a compact reference for common extraction targets, but treat official NLM field documentation as authoritative for final interpretation.
7. Verify before analysis or handoff
Before handing results to another step or another operator, verify:
- The query returned the expected conceptual scope
- The total count was understood correctly
- Pagination covered the intended result set
- The chosen utility and format actually contain the required fields
- Any Bio.Entrez implementation matches the REST behavior for the same search or PMIDs
Troubleshooting
Problem: Unexpectedly broad or narrow results
Check:
- Whether PubMed automatic term mapping changed the meaning of the query
- Whether phrase quoting is too restrictive or too loose
- Whether field tags were omitted or applied to the wrong clause
- Whether MeSH terms, explosion behavior, or free-text synonyms are mismatched
- Whether date or publication-type filters are suppressing expected records
Action:
- Re-run a small
ESearch - Inspect translation behavior
- Compare a fielded query against a simpler baseline
- Log the exact before/after query strings
Problem: Only the first page was retrieved
Cause:
retstart/retmaxpagination was not implemented, orusehistory=ywas omitted for larger retrievals.
Action:
- Repeat
ESearchwith history enabled - Capture
WebEnvandquery_key - Page explicitly and reconcile total retrieved vs
Count
Problem: Missing fields in the response
Cause:
- The selected utility or format does not expose the needed field.
Action:
- Compare
ESummaryversusEFetch - Verify
retmodeandrettype - Check
assets/schema-map.jsonfor common expectations - Confirm field semantics in official NLM documentation before changing the parser
Problem: HTTP 429, temporary blocks, or unstable responses
Action:
- Slow down request rate and reduce concurrency
- Add bounded backoff and retry with logging
- Confirm identifying metadata and API key configuration
- Prefer scheduled, history-based batch retrieval over bursty repeated searches
- Review current official NCBI usage guidance instead of assuming a fixed limit from memory
Problem: ELink results look incomplete
Cause:
- Link coverage depends on the selected link name and on NCBI data availability.
Action:
- Verify the exact
linkname - Treat returned relationships as availability-dependent, not guaranteed complete citation coverage
- Record which link type was used in downstream outputs
Problem: REST and Bio.Entrez outputs do not match
Action:
- Compare the exact database, utility, query, and format parameters
- Ensure both paths use the same IDs or the same history state
- Confirm parsing assumptions rather than assuming the client wrapper changed PubMed behavior
Additional Resources
references/integration-patterns.mdfor utility selection, history-server decisions, batching, and output-format notesexamples/request-response-example.mdfor concrete REST requests, expected response elements, and a Bio.Entrez equivalent- Official NCBI Entrez Programming Utilities Help:
https://www.ncbi.nlm.nih.gov/books/NBK25501/ - Official PubMed User Guide:
https://pubmed.ncbi.nlm.nih.gov/help/ - MeSH reference:
https://www.ncbi.nlm.nih.gov/mesh - Biopython Entrez tutorial:
https://biopython.org/docs/latest/Tutorial/chapter_entrez.html - Biopython
Bio.EntrezAPI reference:https://biopython.org/docs/latest/api/Bio.Entrez.html - NLM MEDLINE/PubMed field descriptions:
https://www.nlm.nih.gov/bsd/mms/medlineelements.html
Related Skills
Use a neighboring skill instead when the task drifts into:
- generic literature review planning without direct API work
- citation formatting only, without PubMed retrieval design
- Python-only implementation details that do not require direct REST workflow reasoning
- broader biomedical database comparison beyond PubMed and Entrez
Notes on Upstream Intent and Provenance
This enhanced candidate preserves the upstream skill identity and scope: direct PubMed access, advanced query construction, E-utilities use, batch processing, and citation-oriented retrieval. The wording has been rewritten into an operator-focused playbook so the skill is safer and more executable without changing its core purpose.