name: "pubmed-database" description: "PubMed Database workflow skill. Use this skill when the user needs direct REST API access to PubMed. Advanced Boolean and MeSH queries, E-utilities API, batch processing, and citation-oriented retrieval. For Python workflows, prefer Biopython (Bio.Entrez). Use this skill for direct HTTP/REST work or custom API implementations, and preserve upstream workflow intent, copied support files, and provenance before handoff." version: "0.0.1" category: "backend" tags:

"pubmed-database"
"direct"
"rest"
"api"
"access"
"pubmed"
"advanced"
"boolean"
"omni-enhanced" complexity: "advanced" risk: "caution" tools:
"codex-cli"
"claude-code"
"cursor"
"gemini-cli"
"opencode" source: "omni-team" author: "Omni Skills Team" date_added: "2026-04-15" date_updated: "2026-04-19" source_type: "omni-curated" maintainer: "Omni Skills Team" family_id: "pubmed-database" family_name: "PubMed Database" variant_id: "omni" variant_label: "Omni Curated" is_default_variant: true derived_from: "skills/pubmed-database" upstream_skill: "skills/pubmed-database" upstream_author: "sickn33" upstream_source: "community" upstream_pr: "79" upstream_head_repo: "diegosouzapw/awesome-omni-skills" upstream_head_sha: "6bf093920a93e68fa8263cf6ee767d7407989d56" curation_surface: "skills_omni" enhanced_origin: "omni-skills-private" source_repo: "diegosouzapw/awesome-omni-skills" replaces:
"pubmed-database"

PubMed Database

Overview

This skill supports direct, official programmatic access to PubMed through NCBI Entrez E-utilities.

Use it when the task needs reproducible biomedical literature retrieval, fielded or MeSH-aware searching, batch export, citation-oriented extraction, or a custom integration that should stay aligned with official PubMed and NCBI behavior.

Keep the original upstream intent intact: this skill exists for direct REST access and custom workflows. For Python implementations, prefer Bio.Entrez as a client wrapper, but design and verify the workflow in terms of the underlying E-utilities semantics first.

Do not fall back to scraping PubMed HTML pages when E-utilities already expose the needed data.

When to Use This Skill

Activate this skill when one or more of these are true:

You need direct HTTP access to PubMed or Entrez E-utilities.
You need a repeatable, auditable search strategy rather than an ad hoc UI search.
You must construct advanced Boolean, field-tagged, date-bounded, publication-type, or MeSH-informed queries.
You need to retrieve many records in batches without manually copying PMIDs.
You need to compare ESearch, ESummary, EFetch, and ELink for the same workflow.
You are building a custom integration for citation metadata, abstracts, identifiers, or related-record lookup.
You must verify how PubMed interpreted a query before exporting or analyzing results.

Do not use this skill as the first choice when:

The user only needs a quick manual literature search in the PubMed web UI.
The task is purely Python automation and a higher-level client already covers the needed behavior; in that case, still use this skill for query design and API semantics, but implement with Bio.Entrez.
The task requires unsupported data collection patterns such as HTML scraping or aggressive harvesting.

Operating Table

Situation	Start here	Why it matters
Choosing the right E-utility	`references/integration-patterns.md`	Helps decide between ESearch, ESummary, EFetch, and ELink before building requests
Designing a reproducible query	`examples/request-response-example.md`	Shows fielded search, translation checks, and history-server usage with concrete request patterns
Mapping output fields for extraction	`assets/schema-map.json`	Gives a compact machine-readable map for common citation, abstract, journal, and identifier extraction goals
First production call	This `SKILL.md`	Establishes safe request structure, identification, batching, and troubleshooting
Python implementation	This `SKILL.md`, then `examples/request-response-example.md`	Keeps REST semantics primary, then shows a Bio.Entrez equivalent without changing policy obligations

Workflow

1. Define the retrieval target

Clarify all of the following before making requests:

Research question or operational objective
Concepts, synonyms, abbreviations, and likely spelling variants
Required filters such as date range, language, species, publication type, or journal
Output need: counts only, lightweight summaries, full structured records, or related links
Expected volume: a few records, hundreds, or a large result set needing pagination and checkpointing

For evidence-sensitive work such as systematic review support, combine controlled vocabulary and free-text terms deliberately instead of assuming one will fully cover the concept.

2. Build the query explicitly

Construct the query with field tags and Boolean logic instead of relying on vague free text.

Common patterns include:

Title/abstract terms for recent phrasing: term[Title/Abstract]
Author lookup: Surname Initials[Author]
Journal restriction: Journal Name[Journal]
Publication type: randomized controlled trial[Publication Type]
Date restrictions with Entrez date parameters or explicit query clauses
MeSH-driven concept expansion, often paired with free-text synonyms

Good practice:

Quote phrases only when you want a phrase-level constraint.
Use parentheses around concept groups.
Keep a logged copy of the exact submitted query string.
For recall-sensitive searches, pair MeSH with keyword synonyms rather than treating them as interchangeable.

3. Run `ESearch` first and inspect interpretation

Use ESearch to verify whether PubMed interpreted the query as intended.

Minimum concerns to verify:

Count
Returned identifiers for a small first page
Search interpretation or translation details when available
Whether the query is unexpectedly broad or narrow

Do this before launching large exports.

If the result set is non-trivial, prefer usehistory=y so downstream calls can reference WebEnv and query_key instead of copying large PMID lists through every step.

4. Decide the downstream retrieval utility

Choose the next step based on the actual output need:

ESearch: find PMIDs, counts, and search interpretation
ESummary: lightweight metadata review, screening support, fast record summaries
EFetch: richer record retrieval for structured extraction, abstracts, identifiers, and detailed citation fields
ELink: related-record, citation-link, or cross-database relationships when available

Do not assume ESummary and EFetch contain the same fields.

5. Batch safely for larger result sets

For larger jobs:

Call ESearch with usehistory=y
Capture and log Count, WebEnv, and query_key
Page through records with retstart and retmax
Retrieve with ESummary or EFetch in bounded batches
Log progress after each batch
Check cumulative retrieved records against expected count

Operational guardrails:

Use respectful pacing and bounded retries.
Provide identifying request metadata such as tool and email as required by NCBI guidance.
If using an API key, configure it explicitly rather than assuming higher throughput automatically applies.
Do not make a single oversized request when a history-backed paginated workflow is safer.
Checkpoint enough state to resume after interruption.

Recommended audit fields per batch:

Query string
Utility used
retstart
retmax
Cumulative records written
WebEnv and query_key when using the history server
Timestamp and any retry events

6. Prefer machine-parseable formats for extraction

When building parsers or downstream transforms:

Prefer structured formats such as XML when field reliability matters.
Use ESummary only for summary-oriented metadata needs.
Use EFetch when you need richer record content.
Validate the requested retmode and rettype against the utility and extraction goal.

Use assets/schema-map.json as a compact reference for common extraction targets, but treat official NLM field documentation as authoritative for final interpretation.

7. Verify before analysis or handoff

Before handing results to another step or another operator, verify:

The query returned the expected conceptual scope
The total count was understood correctly
Pagination covered the intended result set
The chosen utility and format actually contain the required fields
Any Bio.Entrez implementation matches the REST behavior for the same search or PMIDs

Troubleshooting

Problem: Unexpectedly broad or narrow results

Check:

Whether PubMed automatic term mapping changed the meaning of the query
Whether phrase quoting is too restrictive or too loose
Whether field tags were omitted or applied to the wrong clause
Whether MeSH terms, explosion behavior, or free-text synonyms are mismatched
Whether date or publication-type filters are suppressing expected records

Action:

Re-run a small ESearch
Inspect translation behavior
Compare a fielded query against a simpler baseline
Log the exact before/after query strings

Problem: Only the first page was retrieved

Cause:

retstart / retmax pagination was not implemented, or usehistory=y was omitted for larger retrievals.

Action:

Repeat ESearch with history enabled
Capture WebEnv and query_key
Page explicitly and reconcile total retrieved vs Count

Problem: Missing fields in the response

Cause:

The selected utility or format does not expose the needed field.

Action:

Compare ESummary versus EFetch
Verify retmode and rettype
Check assets/schema-map.json for common expectations
Confirm field semantics in official NLM documentation before changing the parser

Problem: HTTP 429, temporary blocks, or unstable responses

Action:

Slow down request rate and reduce concurrency
Add bounded backoff and retry with logging
Confirm identifying metadata and API key configuration
Prefer scheduled, history-based batch retrieval over bursty repeated searches
Review current official NCBI usage guidance instead of assuming a fixed limit from memory

Problem: `ELink` results look incomplete

Cause:

Link coverage depends on the selected link name and on NCBI data availability.

Action:

Verify the exact linkname
Treat returned relationships as availability-dependent, not guaranteed complete citation coverage
Record which link type was used in downstream outputs

Problem: REST and Bio.Entrez outputs do not match

Action:

Compare the exact database, utility, query, and format parameters
Ensure both paths use the same IDs or the same history state
Confirm parsing assumptions rather than assuming the client wrapper changed PubMed behavior

Additional Resources

references/integration-patterns.md for utility selection, history-server decisions, batching, and output-format notes
examples/request-response-example.md for concrete REST requests, expected response elements, and a Bio.Entrez equivalent
Official NCBI Entrez Programming Utilities Help: https://www.ncbi.nlm.nih.gov/books/NBK25501/
Official PubMed User Guide: https://pubmed.ncbi.nlm.nih.gov/help/
MeSH reference: https://www.ncbi.nlm.nih.gov/mesh
Biopython Entrez tutorial: https://biopython.org/docs/latest/Tutorial/chapter_entrez.html
Biopython Bio.Entrez API reference: https://biopython.org/docs/latest/api/Bio.Entrez.html
NLM MEDLINE/PubMed field descriptions: https://www.nlm.nih.gov/bsd/mms/medlineelements.html

Related Skills

Use a neighboring skill instead when the task drifts into:

generic literature review planning without direct API work
citation formatting only, without PubMed retrieval design
Python-only implementation details that do not require direct REST workflow reasoning
broader biomedical database comparison beyond PubMed and Entrez

Notes on Upstream Intent and Provenance

This enhanced candidate preserves the upstream skill identity and scope: direct PubMed access, advanced query construction, E-utilities use, batch processing, and citation-oriented retrieval. The wording has been rewritten into an operator-focused playbook so the skill is safer and more executable without changing its core purpose.

ナビゲーション

Skillsとは？

リンク

pubmed-database

PubMed Database

Overview

When to Use This Skill

Operating Table

Workflow

1. Define the retrieval target

2. Build the query explicitly

3. Run `ESearch` first and inspect interpretation

4. Decide the downstream retrieval utility

5. Batch safely for larger result sets

6. Prefer machine-parseable formats for extraction

7. Verify before analysis or handoff

Troubleshooting

Problem: Unexpectedly broad or narrow results

Problem: Only the first page was retrieved

Problem: Missing fields in the response

Problem: HTTP 429, temporary blocks, or unstable responses

Problem: `ELink` results look incomplete

Problem: REST and Bio.Entrez outputs do not match

Additional Resources

Related Skills

Notes on Upstream Intent and Provenance

関連スキル(🔧 開発ツール)

ナビゲーション

Skillsとは？

リンク

pubmed-database

PubMed Database

Overview

When to Use This Skill

Operating Table

Workflow

1. Define the retrieval target

2. Build the query explicitly

3. Run ESearch first and inspect interpretation

4. Decide the downstream retrieval utility

5. Batch safely for larger result sets

6. Prefer machine-parseable formats for extraction

7. Verify before analysis or handoff

Troubleshooting

Problem: Unexpectedly broad or narrow results

Problem: Only the first page was retrieved

Problem: Missing fields in the response

Problem: HTTP 429, temporary blocks, or unstable responses

Problem: ELink results look incomplete

Problem: REST and Bio.Entrez outputs do not match

Additional Resources

Related Skills

Notes on Upstream Intent and Provenance

関連スキル(🔧 開発ツール)

3. Run `ESearch` first and inspect interpretation

Problem: `ELink` results look incomplete