source-discovery

name: source-discovery description: Platform-specific search patterns and strategies for discovering ML/AI resources across arXiv, Semantic Scholar, GitHub, HuggingFace, and Papers With Code domain: brainstorm triggers:

search for papers on
find repos related to
search huggingface for
what's on papers with code for
discover models for
find datasets for
search across platforms
source discovery
find implementations of
what repos exist for

Overview

This skill provides the search patterns, API syntax, and strategies for discovering ML/AI resources across five major platforms. Used by the architecture-scout agent and useful for manual searches.

arXiv

Search syntax

API endpoint: http://export.arxiv.org/api/query?search_query=
Field prefixes: ti: (title), abs: (abstract), au: (author), cat: (category), all: (full text)
Boolean: AND, OR, ANDNOT
Date filter: submittedDate:[YYYYMMDD0000+TO+YYYYMMDD2359]

Example queries

ti:"state space model" AND abs:anomaly
cat:cs.LG AND ti:mamba AND submittedDate:[202401010000+TO+202612310000]
all:"selective scan" AND all:"time series"

Category taxonomy (ML-relevant)

Category	Scope
`cs.LG`	Machine learning, deep learning, optimization
`cs.AI`	General AI, knowledge representation, reasoning
`cs.CL`	NLP, language models
`cs.CV`	Computer vision
`cs.NE`	Neural/evolutionary computing
`cs.IR`	Information retrieval, search, RAG
`cs.CR`	Security, adversarial ML
`cs.RO`	Robotics, embodied AI
`stat.ML`	Statistical machine learning
`eess.SP`	Signal processing (time-series, audio)

Web search alternative

When the API is limited, use: site:arxiv.org "<topic>" "<technique>" 2025 OR 2026

Semantic Scholar

API endpoints

Search: https://api.semanticscholar.org/graph/v1/paper/search?query=
Paper details: https://api.semanticscholar.org/graph/v1/paper/{paper_id}
Citations: https://api.semanticscholar.org/graph/v1/paper/{paper_id}/citations
References: https://api.semanticscholar.org/graph/v1/paper/{paper_id}/references

Useful fields parameter

fields=title,abstract,year,citationCount,openAccessPdf,authors,venue,externalIds

Search tips

Natural language queries work better than Boolean
Use year filter: &year=2024-2026
Use fieldsOfStudy filter: &fieldsOfStudy=Computer Science
Citation graph traversal: find one key paper, then pull its citations and references

Web search alternative

site:semanticscholar.org "<topic>" "<technique>"

GitHub

Search syntax

Repos: https://github.com/search?type=repositories&q=
Code: https://github.com/search?type=code&q=

Useful filters

<topic> stars:>50 pushed:>2025-01-01 language:python
<topic> stars:>100 language:python topic:machine-learning
<architecture>+<task> in:readme stars:>20

Sorting

sort:stars — most popular
sort:updated — most recently active
Default (no sort) — best match

What to look for in repos

Star count + trajectory (growing fast?)
Last commit date (is it maintained?)
README quality (documentation = usability)
Issues/PRs (active community?)
License (can you use it?)
Dependencies (PyTorch? JAX? compatible with your stack?)

Web search alternative

site:github.com "<topic>" "<technique>" readme

HuggingFace

Hub search

Models: https://huggingface.co/models?search=<query>&sort=downloads
Datasets: https://huggingface.co/datasets?search=<query>&sort=downloads
Spaces: https://huggingface.co/spaces?search=<query>&sort=likes

Useful filters

Models: filter by task (text-classification, image-classification, etc.), library (pytorch, jax), language
Datasets: filter by task, size, language, modality
Spaces: filter by SDK (gradio, streamlit)

What to look for

Download count (adoption signal)
Model card quality (documentation)
Task tags (correct categorization)
Linked paper (academic backing)
Community discussions (known issues)

API access

from huggingface_hub import HfApi
api = HfApi()
models = api.list_models(search="mamba", sort="downloads", direction=-1)
datasets = api.list_datasets(search="anomaly detection", sort="downloads")

Web search alternative

site:huggingface.co "<topic>" model OR dataset

Papers With Code

Key pages

Tasks: https://paperswithcode.com/task/<task-slug>
Methods: https://paperswithcode.com/method/<method-slug>
SOTA: https://paperswithcode.com/sota/<benchmark-slug>
Search: https://paperswithcode.com/search?q=<query>

What to look for

SOTA tables — who's on top, by how much, with what method
Method pages — linked papers + code repos
Task taxonomy — find adjacent tasks you might not have considered
Benchmark pages — standard evaluation protocols

Web search alternative

site:paperswithcode.com "<topic>" "<technique>"

General Search Strategies

Snowball search

Start with 1-2 key papers
Pull their references (what did they build on?)
Pull their citations (who built on them?)
Repeat for the most relevant results

Author tracking

When you find a relevant paper, check the first/last author's recent publications — they likely have follow-up work.

Trending detection

GitHub: sort by "recently created" + "most stars this week"
HuggingFace: sort by "trending"
arXiv: check cs.LG/cs.AI daily listings for keyword matches
Twitter/X: search for paper titles or arXiv IDs for community discussion

Cross-platform verification

Paper found on arXiv → check GitHub for code → check HuggingFace for models → check Papers With Code for benchmarks

Query expansion

Start with the exact topic, then expand:

Exact: "Mamba anomaly detection"
Component: "state space model" + "anomaly detection" separately
Adjacent: "selective scan" + "time series" or "out-of-distribution detection"
Competitor: "transformer anomaly detection" (to find what you'll compare against)

ナビゲーション

Skillsとは？

リンク

Overview

arXiv

Search syntax

Example queries

Category taxonomy (ML-relevant)

Web search alternative

Semantic Scholar

API endpoints

Useful fields parameter

Search tips

Web search alternative

GitHub

Search syntax

Useful filters

Sorting

What to look for in repos

Web search alternative

HuggingFace

Hub search

Useful filters

What to look for

API access

Web search alternative

Papers With Code

Key pages

What to look for

Web search alternative

General Search Strategies

Snowball search

Author tracking

Trending detection

Cross-platform verification

Query expansion

関連スキル(🔧 開発ツール)