name: curate-genome-assembly description: Process genome assembly datasets for VEuPathDB resources
Genome Assembly Dataset Curation
This skill guides processing of genome assembly datasets for VEuPathDB resources.
Prerequisites Check
This workflow requires the following repositories in veupathdb-repos/:
- ApiCommonPresenters
- EbrcModelCommon
First, run the repository status check to verify repositories are present:
Note: this script is located in the skill directory
bash scripts/check-repos.sh ApiCommonPresenters EbrcModelCommon
If repositories are missing, the script will provide clone instructions.
Branch Confirmation: After verifying repositories exist, check their current branches and status using git -C <path>, then confirm with the user before proceeding. Users typically create dataset-specific branches (see curator branching guidelines).
Example:
git -C veupathdb-repos/ApiCommonPresenters branch --show-current
git -C veupathdb-repos/ApiCommonPresenters status -sb
Working Directory (Curation Workspace Directory)
IMPORTANT: All commands in this workflow must be run from your curation workspace directory (the directory that contains veupathdb-repos/ as a subdirectory).
For Claude Code:
- DO NOT use
cdcommands to change intoveupathdb-repos/subdirectories - Use
git -C <path>for git operations in subdirectories - Use absolute paths or relative paths from the curation workspace directory
- Example:
git -C veupathdb-repos/ApiCommonPresenters statusinstead ofcd veupathdb-repos/ApiCommonPresenters && git status
The workflow will create a tmp/ subdirectory in the curation workspace directory for intermediate files.
Required Information
Gather the following before starting:
- VEuPathDB project - Valid projects listed in resources/valid-projects.json
- Assembly GenBank accession (e.g.,
GCA_000988875.2including version)
Workflow Overview
Step 1: Fetch Assembly Metadata from NCBI
Fetch assembly metadata from NCBI using the GenBank accession.
Command:
curl -X GET "https://api.ncbi.nlm.nih.gov/datasets/v2/genome/accession/<ASSEMBLY_ACCESSION>/dataset_report" \
-H "Accept: application/json" > tmp/<ASSEMBLY_ACCESSION>_dataset_report.json
Detailed instructions: Step 1 - Fetch NCBI Metadata
Step 2: Fetch BioProject Metadata
Extract the BioProject accession from the assembly report and fetch additional details.
Command:
node scripts/fetch-bioproject.js <BIOPROJECT_ACCESSION>
This retrieves the BioProject title and description, saved to tmp/<BIOPROJECT>_bioproject.json.
Detailed instructions: Step 2 - Fetch BioProject
Step 3: Fetch PubMed Data
Find and fetch publications for the genome assembly.
Command:
node scripts/fetch-pubmed.js <ASSEMBLY_ACCESSION>
Results saved to tmp/<ASSEMBLY_ACCESSION>_pubmed.json.
Detailed instructions: Step 3 - Fetch PubMed
Step 4: Curate Contacts
Identify and curate contact entries for the genome submission.
Contact identification priority:
- Named submitter from assembly metadata
- Senior/last author from PubMed publications (if available)
- Curator judgment for additional contacts
Actions:
- Search existing contacts in
veupathdb-repos/EbrcModelCommon/Model/lib/xml/datasetPresenters/contacts/allContacts.xml - Create new contact entries if needed
- Present choices to curator for review
Detailed instructions: Step 4 - Curate Contacts
Step 5: Generate and Insert Presenter XML
Generate the datasetPresenter XML and insert it into the appropriate presenter file.
Command:
node scripts/generate-presenter-xml.js <ASSEMBLY_ACCESSION> <PROJECT> <PRIMARY_CONTACT_ID> [ADDITIONAL_CONTACT_IDS...]
Target file: veupathdb-repos/ApiCommonPresenters/Model/lib/xml/datasetPresenters/<PROJECT>.xml
Detailed instructions: Step 5 - Update Presenter Files
Next Steps
After completing this workflow:
- Review generated XML for TODO fields that require curator input
- Commit changes to dataset branch (curator handles git operations)
- Create pull request for review (curator handles PR creation)
Resources
- Step 1 - Fetch NCBI Metadata
- Step 2 - Fetch BioProject
- Step 3 - Fetch PubMed
- Step 4 - Curate Contacts
- Step 5 - Update Presenter Files
- Curator Branching Guidelines
- Valid VEuPathDB Projects
Scripts
scripts/fetch-bioproject.js- Fetches BioProject metadata from NCBI (esearch + esummary)scripts/fetch-pubmed.js- Fetches PubMed records linked to a BioProject (elink + esummary)scripts/generate-presenter-xml.js- Generates datasetPresenter XML from fetched metadatascripts/check-repos.sh- Validates veupathdb-repos/ repository setup (synced from shared/)