name: "uspto-database" description: "Access USPTO patent data via PatentsView REST API and Google Patents Public Data (BigQuery). Search by inventor, assignee, CPC, or keywords; download metadata and claims; analyze portfolios; track tech trends. For IP landscape analysis, competitor monitoring, prior art search, and tech forecasting in life sciences and biotech." license: "CC0-1.0"
uspto-database
Overview
The USPTO provides two primary programmatic access points for patent data: the PatentsView API (REST, free, no key required for basic use) for structured queries by inventor, assignee, CPC classification, and keywords; and Google Patents Public Data (BigQuery public dataset) for large-scale analytics across the full patent corpus. Both expose data under the CC0 Public Domain Dedication. This skill covers Python-based access patterns for both, plus basic patent portfolio analytics.
When to Use
- Prior art search: Finding existing patents relevant to a technology before filing or to assess freedom-to-operate.
- Competitor IP landscape analysis: Querying all patents from a specific assignee (company or institution) to map their technology portfolio.
- CPC classification search: Finding patents in a specific technology area using Cooperative Patent Classification codes (e.g., C12N for nucleotides/genetic engineering).
- Inventor network analysis: Identifying prolific inventors in a field and their institutional affiliations.
- Technology trend tracking: Counting patent filings by year and technology category to identify emerging areas.
- Life sciences IP analysis: Searching biotech-specific classifications (A61K for pharmaceuticals, C12N for genetics, G16B for bioinformatics).
- For full-text patent PDF downloads, use the USPTO Bulk Data Storage System (BDSS) or Google Patents direct links.
- Rate limits: PatentsView API allows 45 requests/minute without an API key; request a free key for 45 req/min with higher daily limits.
Prerequisites
- Python packages:
requests,pandas,matplotlib - Optional:
google-cloud-bigqueryfor Google Patents Public Data queries - Data requirements: No account needed for PatentsView basic queries; Google Cloud account required for BigQuery
- Rate limits: PatentsView — 45 requests/minute (unauthenticated), higher with free API key
pip install requests pandas matplotlib
pip install google-cloud-bigquery # optional: for BigQuery access
Quick Start
import requests
import pandas as pd
# Search PatentsView API: patents assigned to "Genentech" in CPC class C12N
url = "https://api.patentsview.org/patents/query"
payload = {
"q": {"_and": [
{"_contains": {"assignee_organization": "Genentech"}},
{"_contains": {"cpc_subgroup_id": "C12N"}},
]},
"f": ["patent_number", "patent_title", "patent_date", "assignee_organization"],
"o": {"per_page": 25},
}
resp = requests.post(url, json=payload)
data = resp.json()
df = pd.DataFrame(data["patents"])
print(f"Found: {data['total_patent_count']} patents")
print(df[["patent_number", "patent_title", "patent_date"]].head())
Core API
Query Type 1: Search by Assignee (Company / Institution)
Find all patents granted to a specific organization.
import requests
import pandas as pd
def search_by_assignee(assignee_name: str, per_page: int = 100) -> pd.DataFrame:
url = "https://api.patentsview.org/patents/query"
payload = {
"q": {"_contains": {"assignee_organization": assignee_name}},
"f": [
"patent_number", "patent_title", "patent_date",
"patent_abstract", "assignee_organization", "assignee_country",
],
"o": {"per_page": per_page, "sort": [{"patent_date": "desc"}]},
}
resp = requests.post(url, json=payload)
resp.raise_for_status()
data = resp.json()
df = pd.DataFrame(data.get("patents", []))
print(f"Assignee '{assignee_name}': {data.get('total_patent_count', 0)} total patents")
return df
# Example: patents from Broad Institute
df_broad = search_by_assignee("Broad Institute")
print(df_broad[["patent_number", "patent_title", "patent_date"]].head(10))
# Paginate through all results for large portfolios
def search_assignee_all_pages(assignee_name: str, page_size: int = 100) -> pd.DataFrame:
url = "https://api.patentsview.org/patents/query"
all_patents = []
page = 1
while True:
payload = {
"q": {"_contains": {"assignee_organization": assignee_name}},
"f": ["patent_number", "patent_title", "patent_date", "cpc_subgroup_id"],
"o": {"per_page": page_size, "page": page},
}
resp = requests.post(url, json=payload)
data = resp.json()
patents = data.get("patents", [])
if not patents:
break
all_patents.extend(patents)
total = data.get("total_patent_count", 0)
if len(all_patents) >= total:
break
page += 1
df = pd.DataFrame(all_patents)
print(f"Retrieved {len(df)} patents for '{assignee_name}'")
return df
Query Type 2: Search by CPC Classification
CPC (Cooperative Patent Classification) codes organize patents by technology. Life sciences codes include C12N (nucleotides/genetics), A61K (pharmaceuticals), and G16B (bioinformatics).
import requests
import pandas as pd
# Search by CPC subgroup: C12N15 (mutation/genetic engineering)
url = "https://api.patentsview.org/patents/query"
payload = {
"q": {"_begins": {"cpc_subgroup_id": "C12N15"}},
"f": [
"patent_number", "patent_title", "patent_date",
"assignee_organization", "cpc_subgroup_id",
],
"o": {"per_page": 50, "sort": [{"patent_date": "desc"}]},
}
resp = requests.post(url, json=payload)
data = resp.json()
df = pd.DataFrame(data["patents"])
print(f"C12N15 patents: {data['total_patent_count']}")
print(df[["patent_number", "patent_title", "assignee_organization"]].head(10))
# Common life sciences CPC codes
CPC_LIFE_SCIENCES = {
"C12N": "Microorganisms / enzymes / compositions",
"C12N15": "Mutation / genetic engineering",
"C12Q": "Measuring / testing involving enzymes or microorganisms",
"A61K": "Preparations for medical use",
"A61P": "Therapeutic activity of chemical compounds",
"G16B": "Bioinformatics",
"G16H": "Healthcare informatics",
"C07K": "Peptides / proteins",
}
for code, desc in CPC_LIFE_SCIENCES.items():
print(f" {code:10s}: {desc}")
Query Type 3: Full-Text Keyword Search
Search patent titles and abstracts for specific terms.
import requests
import pandas as pd
def keyword_search(keyword: str, per_page: int = 50) -> pd.DataFrame:
url = "https://api.patentsview.org/patents/query"
payload = {
"q": {"_or": [
{"_text_any": {"patent_title": keyword}},
{"_text_any": {"patent_abstract": keyword}},
]},
"f": [
"patent_number", "patent_title", "patent_date",
"patent_abstract", "assignee_organization",
],
"o": {"per_page": per_page, "sort": [{"patent_date": "desc"}]},
}
resp = requests.post(url, json=payload)
resp.raise_for_status()
data = resp.json()
df = pd.DataFrame(data.get("patents", []))
print(f"Keyword '{keyword}': {data.get('total_patent_count', 0)} patents found")
return df
# Search for CRISPR-related patents
df_crispr = keyword_search("CRISPR")
print(df_crispr[["patent_number", "patent_title", "patent_date"]].head(10))
Query Type 4: Inventor Search
Find patents by inventor name or retrieve an inventor's full publication history.
import requests
import pandas as pd
# Search by inventor name
url = "https://api.patentsview.org/inventors/query"
payload = {
"q": {"_and": [
{"inventor_last_name": "Doudna"},
{"inventor_first_name": "Jennifer"},
]},
"f": ["inventor_id", "inventor_first_name", "inventor_last_name",
"inventor_city", "inventor_state", "inventor_country"],
"o": {"per_page": 10},
}
resp = requests.post(url, json=payload)
data = resp.json()
print(f"Found {data.get('total_inventor_count', 0)} inventors matching 'Jennifer Doudna'")
for inv in data.get("inventors", []):
print(f" ID: {inv['inventor_id']}, Location: {inv.get('inventor_city')}, {inv.get('inventor_country')}")
# Get all patents for a specific inventor by inventor_id
inventor_id = "fl:j_ln:doudna-1" # PatentsView inventor ID format
url = "https://api.patentsview.org/patents/query"
payload = {
"q": {"inventor_id": inventor_id},
"f": ["patent_number", "patent_title", "patent_date", "assignee_organization"],
"o": {"per_page": 100, "sort": [{"patent_date": "desc"}]},
}
resp = requests.post(url, json=payload)
data = resp.json()
df = pd.DataFrame(data.get("patents", []))
print(f"Patents for inventor {inventor_id}: {data.get('total_patent_count', 0)}")
print(df.head(5))
Query Type 5: Date Range and Combined Filters
Combine multiple filters for targeted searches.
import requests
import pandas as pd
# Patents in gene therapy (CPC A61K48) filed 2020-2024 by a US assignee
url = "https://api.patentsview.org/patents/query"
payload = {
"q": {"_and": [
{"_begins": {"cpc_subgroup_id": "A61K48"}},
{"_gte": {"patent_date": "2020-01-01"}},
{"_lte": {"patent_date": "2024-12-31"}},
{"_eq": {"assignee_country": "US"}},
]},
"f": [
"patent_number", "patent_title", "patent_date",
"assignee_organization", "patent_num_claims",
],
"o": {"per_page": 100, "sort": [{"patent_date": "desc"}]},
}
resp = requests.post(url, json=payload)
data = resp.json()
df = pd.DataFrame(data.get("patents", []))
print(f"Gene therapy patents 2020-2024 (US assignee): {data.get('total_patent_count', 0)}")
print(df[["patent_number", "patent_title", "patent_date", "assignee_organization"]].head(10))
Query Type 6: Google Patents BigQuery
For large-scale corpus analytics, use the public Google Patents dataset in BigQuery.
from google.cloud import bigquery
client = bigquery.Client(project="YOUR_GCP_PROJECT")
# Count CRISPR patents by year (Google Patents public data)
query = """
SELECT
EXTRACT(YEAR FROM filing_date) AS filing_year,
COUNT(*) AS patent_count,
COUNT(DISTINCT assignee) AS unique_assignees
FROM `patents-public-data.patents.publications`
WHERE
(LOWER(title_localized[SAFE_OFFSET(0)].text) LIKE '%crispr%'
OR LOWER(abstract_localized[SAFE_OFFSET(0)].text) LIKE '%crispr%')
AND filing_date >= '2010-01-01'
AND country_code = 'US'
GROUP BY filing_year
ORDER BY filing_year
"""
df_bq = client.query(query).to_dataframe()
print(df_bq)
print(f"Peak year: {df_bq.loc[df_bq.patent_count.idxmax(), 'filing_year']} "
f"({df_bq.patent_count.max()} patents)")
Key Parameters
| Parameter | Module | Default | Range / Options | Effect |
|---|---|---|---|---|
per_page | PatentsView "o" | 25 | 1–10000 | Results per API call |
page | PatentsView "o" | 1 | 1–max pages | Page number for pagination |
sort | PatentsView "o" | API default | any field + "asc"/"desc" | Sort order of results |
"f" fields | PatentsView | minimal | any valid field list | Fields returned in response (controls payload size) |
"_begins" | query operator | — | field + prefix string | Prefix match (e.g., CPC code prefix) |
"_contains" | query operator | — | field + substring | Substring search (case-insensitive) |
"_text_any" | query operator | — | field + keywords | Full-text search on title/abstract fields |
Best Practices
-
Request only the fields you need: The
"f"(fields) parameter controls what is returned. Requestingpatent_abstractfor thousands of patents significantly increases payload size and latency. -
Always handle pagination for large result sets: PatentsView caps responses at 10,000 per page maximum. For queries returning >10,000 results, use date-range slicing or narrower CPC codes to split the query.
-
Cache API responses to disk: PatentsView is rate-limited; if building a dataset iteratively, save responses to JSON/CSV after each API call.
import json, pathlib cache = pathlib.Path("cache") cache.mkdir(exist_ok=True) cache_file = cache / "genentech_patents.json" if not cache_file.exists(): resp = requests.post(url, json=payload) cache_file.write_text(resp.text) data = json.loads(cache_file.read_text()) -
Use CPC codes for technology-specific searches, not just keywords: Keywords miss synonyms and foreign-language patents; CPC codes are assigned by patent examiners and are more systematic.
-
Validate assignee names: Company names in patent records vary (e.g., "Genentech Inc.", "Genentech, Inc.", "GENENTECH INC"). Use
_containsfor fuzzy matching, then deduplicate in pandas.
Common Workflows
Workflow 1: Technology Landscape Analysis — Filing Trends by Year
Goal: Count patents filed in a CPC class by year and plot the trend.
import requests
import pandas as pd
import matplotlib.pyplot as plt
from collections import defaultdict
def count_patents_by_year(cpc_prefix: str, start_year: int = 2010) -> pd.DataFrame:
url = "https://api.patentsview.org/patents/query"
counts = defaultdict(int)
page = 1
while True:
payload = {
"q": {"_and": [
{"_begins": {"cpc_subgroup_id": cpc_prefix}},
{"_gte": {"patent_date": f"{start_year}-01-01"}},
]},
"f": ["patent_number", "patent_date"],
"o": {"per_page": 10000, "page": page},
}
resp = requests.post(url, json=payload)
patents = resp.json().get("patents", [])
if not patents:
break
for p in patents:
year = p["patent_date"][:4]
counts[year] += 1
total = resp.json().get("total_patent_count", 0)
if sum(counts.values()) >= total:
break
page += 1
df = pd.DataFrame(sorted(counts.items()), columns=["year", "count"])
return df
df_trend = count_patents_by_year("C12N15", start_year=2010)
fig, ax = plt.subplots(figsize=(8, 4))
ax.bar(df_trend["year"], df_trend["count"], color="steelblue", edgecolor="white")
ax.set_xlabel("Year")
ax.set_ylabel("Patents granted")
ax.set_title("US Patents: C12N15 (Genetic Engineering) by Year")
plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig("cpc_trend.png", dpi=150)
print(f"Trend plotted: {df_trend['count'].sum()} total patents -> cpc_trend.png")
Workflow 2: Assignee Portfolio Comparison
Goal: Compare patent counts across multiple biotech companies in a target CPC class.
import requests
import pandas as pd
import matplotlib.pyplot as plt
def count_patents_by_assignee(assignees: list, cpc_prefix: str) -> pd.DataFrame:
url = "https://api.patentsview.org/patents/query"
records = []
for assignee in assignees:
payload = {
"q": {"_and": [
{"_contains": {"assignee_organization": assignee}},
{"_begins": {"cpc_subgroup_id": cpc_prefix}},
]},
"f": ["patent_number"],
"o": {"per_page": 1}, # only need total count
}
resp = requests.post(url, json=payload)
total = resp.json().get("total_patent_count", 0)
records.append({"assignee": assignee, "patent_count": total})
print(f" {assignee}: {total} patents")
df = pd.DataFrame(records).sort_values("patent_count", ascending=True)
return df
companies = ["Genentech", "Amgen", "Regeneron", "AstraZeneca", "Novartis"]
df_comp = count_patents_by_assignee(companies, cpc_prefix="A61K")
fig, ax = plt.subplots(figsize=(7, 4))
ax.barh(df_comp["assignee"], df_comp["patent_count"], color="salmon")
ax.set_xlabel("Patent count (A61K)")
ax.set_title("Pharmaceutical Patents by Assignee (CPC A61K)")
plt.tight_layout()
plt.savefig("assignee_comparison.png", dpi=150)
print("Comparison chart saved -> assignee_comparison.png")
Expected Outputs
pd.DataFramewith patent records (columns depend on requested"f"fields)cpc_trend.png— bar chart of patent counts by yearassignee_comparison.png— horizontal bar chart comparing companiestotal_patent_countin API response gives the full corpus size for a query
Troubleshooting
| Problem | Cause | Solution |
|---|---|---|
HTTPError 429 Too Many Requests | Exceeded 45 req/min rate limit | Add time.sleep(1.5) between requests; request a free API key |
Empty patents list in response | Query too narrow or field name incorrect | Check field names in PatentsView API docs; test query in the web UI first |
| Results miss known patents | Exact string matching on assignee name | Use _contains instead of _eq; check for name variants |
KeyError: patent_date | Field not requested in "f" list | Add "patent_date" to the "f" array |
| BigQuery auth error | GCP credentials not configured | Run gcloud auth application-default login or set GOOGLE_APPLICATION_CREDENTIALS |
| CPC prefix returns no results | Invalid CPC code or typo | Verify code at CPC classification browser |
References
- PatentsView API Documentation — full field list and query operator reference
- PatentsView API Explorer — browser-based query builder for testing
- Google Patents Public Data (BigQuery) — schema documentation
- CPC Classification Browser — browse life sciences CPC codes
- USPTO Bulk Data Storage System — full-text patent XML downloads