name: base-academic-search description: "Search 400M+ open access documents via the BASE search engine API" metadata: openclaw: emoji: "🔍" category: "literature" subcategory: "search" keywords: ["BASE", "academic search", "open access", "Bielefeld", "OAI-PMH", "repository aggregator"] source: "https://www.base-search.net/"

BASE (Bielefeld Academic Search Engine) API

Overview

BASE is one of the world's largest search engines for academic open access web resources. Operated by Bielefeld University Library, it indexes 400M+ documents from 11,000+ content providers including institutional repositories, preprint servers, and digital libraries. Unlike Google Scholar, BASE provides structured metadata, license information, and full-text links. The API is free with registration.

API Endpoints

Base URL

https://api.base-search.net/cgi-bin/BaseHttpSearchInterface.fcgi

Search

# Basic keyword search (JSON response)
curl "https://api.base-search.net/cgi-bin/BaseHttpSearchInterface.fcgi?\
func=PerformSearch&query=climate+change+adaptation&format=json&hits=20"

# Search with field filters
curl "https://api.base-search.net/cgi-bin/BaseHttpSearchInterface.fcgi?\
func=PerformSearch&query=dctitle:transformer+AND+dcsubject:NLP&format=json"

# Filter by document type and year
curl "https://api.base-search.net/cgi-bin/BaseHttpSearchInterface.fcgi?\
func=PerformSearch&query=deep+learning&dctypenorm=121&dcyear:2024&format=json"

# Open access only
curl "https://api.base-search.net/cgi-bin/BaseHttpSearchInterface.fcgi?\
func=PerformSearch&query=CRISPR&dcrights:open&format=json"

Search Fields

Field	Description	Example
`dctitle`	Title	`dctitle:attention+mechanism`
`dccreator`	Author	`dccreator:vaswani`
`dcsubject`	Subject/keywords	`dcsubject:machine+learning`
`dcdescription`	Abstract	`dcdescription:neural+network`
`dcyear`	Publication year	`dcyear:2024`
`dctype`	Document type text	`dctype:article`
`dctypenorm`	Normalized type code	`121` (journal article)
`dcrights`	Access rights	`dcrights:open`
`dclang`	Language	`dclang:eng`
`dclink`	Source URL	`dclink:arxiv.org`
`dcoa`	Open access status	`dcoa:1` (OA), `dcoa:2` (restricted)
`dcprovider`	Content provider	`dcprovider:arxiv.org`

Document Type Codes

Code	Type
`121`	Journal article
`122`	Book / monograph
`14`	Conference paper
`15`	Thesis / dissertation
`17`	Report
`18`	Preprint

Query Parameters

Parameter	Description	Default
`func`	Must be `PerformSearch`	Required
`query`	Search query with optional field prefixes	Required
`format`	Response format: `json` or `xml`	`xml`
`hits`	Results per page (max 125)	10
`offset`	Pagination offset	0
`sortby`	Sort: `dcyear desc`, `score desc`	relevance

Response Structure

{
  "response": {
    "numFound": 45200,
    "start": 0,
    "docs": [
      {
        "dctitle": "Attention Is All You Need",
        "dccreator": ["Ashish Vaswani", "Noam Shazeer"],
        "dcyear": "2017",
        "dcsubject": ["machine learning", "attention mechanism"],
        "dcdescription": "The dominant sequence transduction models...",
        "dcidentifier": "https://arxiv.org/abs/1706.03762",
        "dcsource": "arXiv.org",
        "dcprovider": "arxiv.org",
        "dcdocid": "abc123xyz",
        "dcoa": 1,
        "dctypenorm": ["18"],
        "dclang": ["eng"]
      }
    ]
  }
}

Python Usage

import requests

BASE_URL = "https://api.base-search.net/cgi-bin/BaseHttpSearchInterface.fcgi"


def search_base(query: str, hits: int = 20,
                doc_type: int = None, oa_only: bool = False) -> list:
    """Search BASE for academic open access documents."""
    q = query
    if doc_type:
        q += f" AND dctypenorm:{doc_type}"
    if oa_only:
        q += " AND dcoa:1"

    params = {
        "func": "PerformSearch",
        "query": q,
        "format": "json",
        "hits": hits,
        "sortby": "dcyear desc",
    }

    resp = requests.get(BASE_URL, params=params)
    resp.raise_for_status()
    data = resp.json()

    results = []
    for doc in data.get("response", {}).get("docs", []):
        results.append({
            "title": doc.get("dctitle"),
            "authors": doc.get("dccreator", []),
            "year": doc.get("dcyear"),
            "source": doc.get("dcsource"),
            "url": doc.get("dcidentifier"),
            "abstract": (doc.get("dcdescription") or "")[:300],
            "open_access": doc.get("dcoa") == 1,
            "type": doc.get("dctypenorm", []),
        })
    return results


def search_dissertations(topic: str, lang: str = "eng") -> list:
    """Find dissertations and theses on a topic."""
    query = f"{topic} AND dctypenorm:15 AND dclang:{lang}"
    return search_base(query, hits=50)


def search_by_provider(query: str, provider: str) -> list:
    """Search within a specific content provider."""
    full_query = f"{query} AND dcprovider:{provider}"
    return search_base(full_query)


# Example: find recent open access ML papers
papers = search_base("transformer self-attention", hits=10, oa_only=True)
for p in papers:
    oa = "OA" if p["open_access"] else "restricted"
    print(f"[{p['year']}] {p['title']} ({oa}) — {p['source']}")

# Example: find dissertations on climate modeling
theses = search_dissertations("climate modeling ocean")
for t in theses:
    print(f"[{t['year']}] {t['title']} — {', '.join(t['authors'][:2])}")

BASE vs Other Search Engines

Feature	BASE	Google Scholar	OpenAlex
Records	400M+	Unknown	250M+
Open access focus	Yes	No	Yes
Structured API	Yes	No official API	Yes
License metadata	Yes	No	Partial
Dissertation coverage	Excellent	Good	Limited
Repository-level filtering	Yes	No	No

ナビゲーション

Skillsとは？

リンク

base-academic-search