name: clinicaltrials-database-search description: Query ClinicalTrials.gov API v2 for trial data. Search by condition, drug/intervention, location, sponsor, or phase; fetch details by NCT ID; filter by status; paginate; export CSV. For clinical research, patient matching, and trial portfolio analysis. license: CC-BY-4.0

ClinicalTrials.gov Database — Clinical Trial Search

Overview

Query the ClinicalTrials.gov API v2 (public, no authentication) to search and retrieve clinical trial data worldwide. Supports searching by condition, intervention, location, sponsor, and status; retrieving detailed study information by NCT ID; paginating large result sets; and exporting to CSV.

When to Use

Searching for recruiting clinical trials for a specific condition or disease
Finding trials testing a specific drug, device, or intervention
Locating trials in a specific geographic region for patient referral
Tracking a sponsor's or institution's clinical trial portfolio
Retrieving detailed eligibility criteria, outcomes, and contacts for a specific trial
Analyzing clinical trial trends (phases, enrollment, timelines) across a therapeutic area
Exporting trial data for systematic reviews or meta-analyses
Monitoring trial status changes and results postings
For chemical compound bioactivity data use chembl-database-bioactivity instead; for published literature use pubmed-database

Prerequisites

uv pip install requests pandas

API details:

Base URL: https://clinicaltrials.gov/api/v2
Authentication: None required (public API)
Rate limit: ~50 requests/minute per IP
Response formats: JSON (default), CSV
Max page size: 1000 studies per request
Date format: ISO 8601; text fields use CommonMark Markdown

Quick Start

import requests
import time

CT_API = "https://clinicaltrials.gov/api/v2"

def ct_search(params):
    """Reusable helper for ClinicalTrials.gov searches."""
    response = requests.get(f"{CT_API}/studies", params=params, timeout=30)
    response.raise_for_status()
    return response.json()

# Search for recruiting breast cancer trials
results = ct_search({
    "query.cond": "breast cancer",
    "filter.overallStatus": "RECRUITING",
    "pageSize": 10,
    "sort": "LastUpdatePostDate:desc"
})
print(f"Found {results['totalCount']} trials")
for study in results['studies'][:3]:
    nct = study['protocolSection']['identificationModule']['nctId']
    title = study['protocolSection']['identificationModule']['briefTitle']
    print(f"  {nct}: {title}")

Key Concepts

Response Data Structure

ClinicalTrials.gov returns deeply nested JSON. Key navigation paths:

Data	Path
NCT ID	`study['protocolSection']['identificationModule']['nctId']`
Title	`study['protocolSection']['identificationModule']['briefTitle']`
Status	`study['protocolSection']['statusModule']['overallStatus']`
Phase	`study['protocolSection']['designModule']['phases']`
Enrollment	`study['protocolSection']['designModule']['enrollmentInfo']['count']`
Eligibility	`study['protocolSection']['eligibilityModule']`
Locations	`study['protocolSection']['contactsLocationsModule']['locations']`
Interventions	`study['protocolSection']['armsInterventionsModule']['interventions']`
Results	`study.get('resultsSection')` (None if no results posted)

Study Status Values

Status	Description
`RECRUITING`	Currently recruiting participants
`NOT_YET_RECRUITING`	Approved but not yet open
`ENROLLING_BY_INVITATION`	Invitation-only enrollment
`ACTIVE_NOT_RECRUITING`	Active, enrollment closed
`SUSPENDED`	Temporarily halted
`TERMINATED`	Stopped prematurely
`COMPLETED`	Study concluded
`WITHDRAWN`	Withdrawn before enrollment

Study Phase Values

Phase	Description
`EARLY_PHASE1`	Early Phase 1 (formerly Phase 0)
`PHASE1`	Phase 1 — safety and dosing
`PHASE2`	Phase 2 — efficacy and side effects
`PHASE3`	Phase 3 — large-scale efficacy
`PHASE4`	Phase 4 — post-market surveillance
`NA`	Not applicable (non-drug studies)

Query Parameters Reference

Parameter	Type	Description	Example
`query.cond`	string	Condition/disease	`lung cancer`
`query.intr`	string	Intervention/drug	`Pembrolizumab`
`query.locn`	string	Geographic location	`New York`
`query.spons`	string	Sponsor name	`National Cancer Institute`
`query.term`	string	General full-text search	`immunotherapy`
`filter.overallStatus`	string	Status filter (comma-separated)	`RECRUITING,COMPLETED`
`filter.phase`	string	Phase filter	`PHASE2,PHASE3`
`filter.ids`	string	NCT ID filter	`NCT04852770`
`sort`	string	Sort order	`LastUpdatePostDate:desc`
`pageSize`	int	Results per page (max 1000)	`100`
`pageToken`	string	Pagination token	(from previous response)
`format`	string	Response format	`json` or `csv`

Sort options: LastUpdatePostDate, EnrollmentCount, StartDate, StudyFirstPostDate — each with :asc or :desc.

Core API

1. Search by Condition

results = ct_search({
    "query.cond": "type 2 diabetes",
    "filter.overallStatus": "RECRUITING",
    "pageSize": 20,
    "sort": "LastUpdatePostDate:desc"
})
print(f"Found {results['totalCount']} recruiting diabetes trials")
for study in results['studies'][:5]:
    proto = study['protocolSection']
    nct = proto['identificationModule']['nctId']
    title = proto['identificationModule']['briefTitle']
    print(f"  {nct}: {title}")

2. Search by Intervention/Drug

# Find Phase 3 trials testing Pembrolizumab
results = ct_search({
    "query.intr": "Pembrolizumab",
    "filter.overallStatus": "RECRUITING,ACTIVE_NOT_RECRUITING",
    "filter.phase": "PHASE3",
    "pageSize": 50
})
print(f"Phase 3 Pembrolizumab trials: {results['totalCount']}")

3. Search by Location

results = ct_search({
    "query.cond": "cancer",
    "query.locn": "New York",
    "filter.overallStatus": "RECRUITING",
    "pageSize": 20
})

# Extract location details
for study in results['studies'][:3]:
    locs = study['protocolSection'].get('contactsLocationsModule', {}).get('locations', [])
    for loc in locs:
        if 'New York' in loc.get('city', ''):
            print(f"  {loc.get('facility')}: {loc['city']}, {loc.get('state', '')}")

4. Search by Sponsor

results = ct_search({
    "query.spons": "National Cancer Institute",
    "pageSize": 20
})

for study in results['studies'][:5]:
    sponsor_mod = study['protocolSection']['sponsorCollaboratorsModule']
    lead = sponsor_mod['leadSponsor']['name']
    collabs = [c['name'] for c in sponsor_mod.get('collaborators', [])]
    print(f"  Lead: {lead}, Collaborators: {collabs}")

5. Retrieve Study Details by NCT ID

nct_id = "NCT04852770"
response = requests.get(f"{CT_API}/studies/{nct_id}", timeout=30)
response.raise_for_status()
study = response.json()

# Extract key information
proto = study['protocolSection']
print(f"Title: {proto['identificationModule']['briefTitle']}")
print(f"Status: {proto['statusModule']['overallStatus']}")

# Eligibility criteria
elig = proto.get('eligibilityModule', {})
print(f"Ages: {elig.get('minimumAge')} - {elig.get('maximumAge')}")
print(f"Sex: {elig.get('sex')}")
print(f"Criteria:\n{elig.get('eligibilityCriteria', 'N/A')[:300]}")

6. Pagination for Large Result Sets

all_studies = []
page_token = None
max_pages = 10

for page in range(max_pages):
    params = {
        "query.cond": "cancer",
        "filter.overallStatus": "RECRUITING",
        "pageSize": 1000,
    }
    if page_token:
        params["pageToken"] = page_token

    results = ct_search(params)
    all_studies.extend(results['studies'])
    page_token = results.get('nextPageToken')

    if not page_token:
        break
    time.sleep(1.5)  # respect rate limits

print(f"Retrieved {len(all_studies)} studies across {page + 1} pages")

7. Export to CSV

response = requests.get(f"{CT_API}/studies", params={
    "query.cond": "heart disease",
    "filter.overallStatus": "RECRUITING",
    "format": "csv",
    "pageSize": 1000
}, timeout=60)

with open("heart_disease_trials.csv", "w") as f:
    f.write(response.text)
print("Exported to heart_disease_trials.csv")

Common Workflows

Workflow 1: Multi-Criteria Trial Discovery

import requests, time

CT_API = "https://clinicaltrials.gov/api/v2"

def ct_search(params):
    response = requests.get(f"{CT_API}/studies", params=params, timeout=30)
    response.raise_for_status()
    return response.json()

# Step 1: Search with multiple filters
results = ct_search({
    "query.cond": "lung cancer",
    "query.intr": "immunotherapy",
    "query.locn": "California",
    "filter.overallStatus": "RECRUITING,NOT_YET_RECRUITING",
    "pageSize": 100,
    "sort": "LastUpdatePostDate:desc"
})
print(f"Total matches: {results['totalCount']}")

# Step 2: Filter by phase
phase23 = [
    s for s in results['studies']
    if any(p in ['PHASE2', 'PHASE3']
           for p in s['protocolSection'].get('designModule', {}).get('phases', []))
]
print(f"Phase 2/3 trials: {len(phase23)}")

# Step 3: Extract summaries
for study in phase23[:5]:
    proto = study['protocolSection']
    nct = proto['identificationModule']['nctId']
    title = proto['identificationModule']['briefTitle']
    enrollment = proto.get('designModule', {}).get('enrollmentInfo', {}).get('count', 'N/A')
    print(f"  {nct}: {title} (n={enrollment})")

Workflow 2: Completed Trials with Results Analysis

# Step 1: Find completed trials with posted results
results = ct_search({
    "query.cond": "alzheimer disease",
    "filter.overallStatus": "COMPLETED",
    "pageSize": 100,
    "sort": "LastUpdatePostDate:desc"
})

with_results = [s for s in results['studies'] if s.get('hasResults', False)]
print(f"Completed with results: {len(with_results)} / {len(results['studies'])}")

# Step 2: Get detailed results for top trial
if with_results:
    nct = with_results[0]['protocolSection']['identificationModule']['nctId']
    detail = requests.get(f"{CT_API}/studies/{nct}", timeout=30).json()

    if 'resultsSection' in detail:
        outcomes = detail['resultsSection'].get('outcomeMeasuresModule', {})
        measures = outcomes.get('outcomeMeasures', [])
        for m in measures[:3]:
            print(f"  Outcome: {m.get('title')}")
            print(f"  Type: {m.get('type')}")

Workflow 3: Sponsor Portfolio Comparison

sponsors = ["Pfizer", "Novartis", "Roche"]
for sponsor in sponsors:
    results = ct_search({
        "query.spons": sponsor,
        "filter.overallStatus": "RECRUITING",
        "pageSize": 1
    })
    print(f"{sponsor}: {results['totalCount']} recruiting trials")
    time.sleep(1.5)

Common Recipes

Recipe: Rate-Limited Bulk Search

def ct_search_with_retry(params, max_retries=3):
    for attempt in range(max_retries):
        try:
            response = requests.get(f"{CT_API}/studies", params=params, timeout=30)
            response.raise_for_status()
            return response.json()
        except requests.exceptions.HTTPError as e:
            if e.response.status_code == 429:
                wait = 60
                print(f"Rate limited. Waiting {wait}s...")
                time.sleep(wait)
            else:
                raise
        except requests.exceptions.RequestException:
            if attempt == max_retries - 1:
                raise
            time.sleep(2 ** attempt)
    raise Exception("Max retries exceeded")

Recipe: Extract Study Summary

def extract_summary(study):
    proto = study.get('protocolSection', {})
    ident = proto.get('identificationModule', {})
    status = proto.get('statusModule', {})
    design = proto.get('designModule', {})
    return {
        'nct_id': ident.get('nctId'),
        'title': ident.get('officialTitle') or ident.get('briefTitle'),
        'status': status.get('overallStatus'),
        'phases': design.get('phases', []),
        'enrollment': design.get('enrollmentInfo', {}).get('count'),
        'last_update': status.get('lastUpdatePostDateStruct', {}).get('date')
    }

# Usage
for study in results['studies'][:3]:
    s = extract_summary(study)
    print(f"{s['nct_id']}: {s['status']} | Phase: {s['phases']} | n={s['enrollment']}")

Recipe: Safe Field Navigation

def safe_get(study, *keys, default='N/A'):
    """Navigate nested study JSON safely."""
    current = study
    for key in keys:
        if isinstance(current, dict):
            current = current.get(key)
        else:
            return default
        if current is None:
            return default
    return current

# Usage — handles missing fields gracefully
nct = safe_get(study, 'protocolSection', 'identificationModule', 'nctId')
phases = safe_get(study, 'protocolSection', 'designModule', 'phases', default=[])
enrollment = safe_get(study, 'protocolSection', 'designModule', 'enrollmentInfo', 'count')

Key Parameters

Parameter	Endpoint	Default	Description
`query.cond`	search	—	Condition/disease search term
`query.intr`	search	—	Intervention/drug search term
`query.locn`	search	—	Geographic location filter
`query.spons`	search	—	Sponsor/organization filter
`query.term`	search	—	General full-text search
`filter.overallStatus`	search	all	Comma-separated status values
`filter.phase`	search	all	Comma-separated phase values
`pageSize`	search	10	Results per page (max 1000)
`sort`	search	relevance	`{field}:{asc\|desc}`
`format`	both	`json`	`json` or `csv`
`timeout`	(client)	30s	Set in requests call

Troubleshooting

Problem	Cause	Solution
429 Too Many Requests	Rate limit exceeded (~50/min)	Wait 60s; use max `pageSize=1000`; implement exponential backoff
Empty studies array	No trials match filters	Broaden search (remove status/phase filters); check spelling
400 Bad Request	Invalid parameter value	Verify status/phase values match enumeration exactly (e.g., `RECRUITING` not `recruiting`)
Missing `resultsSection`	Trial has no posted results	Check `study['hasResults']` before accessing results
KeyError on nested field	Not all trials have all modules	Use `.get()` with defaults or `safe_get` helper (see Recipes)
Pagination stops early	`nextPageToken` absent	All results retrieved; check `totalCount` vs collected count
CSV format differs from JSON	Different field structure	CSV flattens nested structure; use JSON for programmatic access
Timeout on large exports	CSV with many results	Increase timeout; paginate with `pageSize=1000` instead

Best Practices

Use maximum page size (1000) for bulk retrieval to minimize request count against rate limit
Always check hasResults before accessing resultsSection — most trials have no posted results
Navigate safely with .get() chains — not all trials populate all modules (especially contactsLocationsModule, armsInterventionsModule)
Specify multiple status values with commas (e.g., RECRUITING,NOT_YET_RECRUITING) — don't make separate requests per status
Use sort=LastUpdatePostDate:desc by default — returns most recently updated trials first
Date interpretation: lastUpdatePostDateStruct.date is ISO 8601 string; type field indicates ACTUAL vs ESTIMATED

Related Skills

pubmed-database — Published literature search complementary to trial registry data
chembl-database-bioactivity — Compound bioactivity data for drugs under investigation
bioservices-multi-database — Alternative database access via unified Python interface

References

ClinicalTrials.gov API documentation: https://clinicaltrials.gov/data-api/api
API migration guide (v1→v2): https://clinicaltrials.gov/data-api/about-api/api-migration
ClinicalTrials.gov homepage: https://clinicaltrials.gov/
OpenAPI specification: https://clinicaltrials.gov/data-api/about-api/api-spec

Bundled Resources

Self-contained entry. Original total: 866 lines (SKILL.md 507 + api_reference.md 359). Scripts: 216 lines (query_clinicaltrials.py).

Original file disposition:

SKILL.md (507 lines) → Core API modules 1-7 (condition, intervention, location, sponsor, details, pagination, CSV export). "Core Capabilities" sections 1-10 consolidated: Search by Condition → Module 1, Search by Intervention → Module 2, Geographic Search → Module 3, Search by Sponsor → Module 4, Retrieve Detailed Study → Module 5, Pagination → Module 6, Data Export → Module 7, Combined Query → Workflow 1, Extract Summary → Recipe. "Resources" section stub → removed, content consolidated inline. Per-use-case disposition: Patient Matching → When to Use bullet + Workflow 1; Research Analysis → When to Use + Workflow 2; Drug Tracking → When to Use + Module 2; Geographic Search → Module 3; Sponsor Tracking → Module 4 + Workflow 3; Data Export → Module 7; Trial Monitoring → When to Use bullet; Eligibility Screening → Module 5
references/api_reference.md (359 lines) → Fully consolidated inline: endpoint parameters → Key Concepts "Query Parameters Reference" table; status/phase values → Key Concepts tables; response structure → Key Concepts "Response Data Structure" table; HTTP error codes → Troubleshooting table; rate limit guidance → Prerequisites + Best Practices; use cases → duplicated main SKILL.md examples, absorbed into Core API; data standards (ISO 8601, CommonMark) → Prerequisites note. Error handling patterns → Recipes "Rate-Limited Bulk Search"
scripts/query_clinicaltrials.py (216 lines) → Helper function pattern: search_studies() → Quick Start ct_search() helper; get_study_details() → Module 5 inline; search_with_all_results() → Module 6 pagination pattern; extract_study_summary() → Recipe "Extract Study Summary". Thin-wrapper shortcut applied — each function was a thin wrapper around requests.get()

Retention: ~465 lines / 866 original (excl. scripts) = ~54%.

ナビゲーション

Skillsとは？

リンク

clinicaltrials-database-search