name: semantic-grep description: In-process semantic search over text files or in-memory strings, using Gemini embeddings via the CF AI Gateway. Use when user wants fuzzy/conceptual search where exact-keyword grep would miss — "sessions discussing regulatory constraints", "code about retry logic", "notes mentioning burnout even if the word isn't there". Complements searching-codebases (regex/AST) and extracting-keywords (YAKE). Do NOT use when an exact string/regex match is what's wanted — grep/rg wins on speed and precision there. metadata: version: 0.1.1

Semantic Grep

jina-grep-style semantic search, done in-process via Python rather than as an external CLI. Embeds query + corpus chunks with gemini-embedding-001, ranks by cosine similarity, returns grep-format output.

When Semantic Search Helps

The core trade-off (lifted from jina-grep-cli's own docs and validated in testing):

Task	Tool
Known exact string, filename, or regex	`grep` / `rg` / `searching-codebases`
"What files discuss concept X" when X may not appear verbatim	semantic-grep
Hybrid: prefilter with grep, rerank by concept	grep → `rerank_candidates()`

Regression test result (workshop session corpus, 135 docs):

"handling regulatory constraints" → top hit "Engineering AI Systems Under Sovereignty Constraints" (0.67). ✓
"sessions about GEPA" → top hit "Gemma, DeepMind's Family of Open Models" (0.69). ✗ — false positive on phonetic neighbor. GEPA is mentioned verbatim in one session description; grep would find it correctly.

Rule: when the user query reads like a named entity or keyword, try grep first. Only reach for semantic-grep when paraphrase/concept matching is actually needed.

Setup

Credentials via proxy.env (Cloudflare AI Gateway w/ BYOK — same pattern as invoking-gemini):

CF_ACCOUNT_ID=...
CF_GATEWAY_ID=...
CF_API_TOKEN=...

Direct-API fallback: GOOGLE_API_KEY or GEMINI_API_KEY env var. No dependencies beyond requests + numpy.

Quick Start

import sys
sys.path.insert(0, '/mnt/skills/user/semantic-grep/scripts')
from semantic_grep import semantic_grep, format_grep

# Directory of .txt files
results = semantic_grep("error handling under load", "/path/to/notes",
                        top_k=5, granularity="paragraph")
print(format_grep(results))
# notes/incidents.txt:42:  When the queue depth exceeds... [0.71]
# notes/postmortem.txt:8:  Under sustained traffic we saw... [0.68]

Core API

`semantic_grep(query, corpus, *, top_k=10, threshold=None, ...)`

Main search function.

query (str) — the search query (embedded with RETRIEVAL_QUERY task type)
corpus (str | Path | list[Chunk]) — a file, directory, or pre-chunked list
top_k (int | None) — max results; None = all above threshold
threshold (float | None) — cosine similarity cutoff; None = no filter (top_k only)
granularity ("paragraph" | "line") — how to chunk files (default paragraph)
include (str) — filename-glob filter when corpus is a directory (default "*.txt"). Matches against Path.name only, not the full path — "*.md" works, "docs/*.md" does not.
model (str) — default "gemini-embedding-001"
dim (int) — 128 / 768 / 1536 / 3072 (default 768; MRL-truncated + renormalized)
task ("text" | "code") — selects text vs code task types

Returns list[Match] where Match has path, line, text, score.

`load_corpus(path, , include=".txt", granularity="paragraph") -> list[Chunk]`

Load and chunk a file or directory without embedding. Useful for inspecting what gets embedded before paying for the API call.

`embed_batch(texts, task_type, *, model, dim, group_size=100) -> np.ndarray`

Lower-level: embed a list of strings directly via :batchEmbedContents. Returns (N, dim) float32 array, rows normalized when dim < 3072.

`format_grep(matches, *, max_text_chars=200, show_score=True) -> str`

Format matches as grep output: path:line: snippet [score].

Pipe-mode Rerank Pattern

The highest-leverage use isn't naive full-corpus semantic search — it's hybrid retrieval: fast coarse filter → semantic rerank.

import subprocess
from semantic_grep import Chunk, semantic_grep, format_grep

# Stage 1: fast exact/regex prefilter with rg
result = subprocess.run(
    ["rg", "-n", "--no-heading", "error|fail|timeout", "logs/"],
    capture_output=True, text=True,
)

# Parse `path:line:text` into Chunks
chunks = []
for raw in result.stdout.splitlines():
    path, line, text = raw.split(":", 2)
    chunks.append(Chunk(path=path, line=int(line), text=text))

# Stage 2: semantic rerank on the prefiltered subset
ranked = semantic_grep("intermittent queue saturation during peak traffic",
                       chunks, top_k=10)
print(format_grep(ranked))

This is how you scale past the "embed the whole corpus every call" limit without needing a vector DB. The exact-match stage cheaply cuts millions of lines to thousands; semantic reranks those.

Task Types (Gemini)

text mode (default): query → RETRIEVAL_QUERY, docs → RETRIEVAL_DOCUMENT. Asymmetric — documented to outperform symmetric encoding for retrieval.
code mode: query → CODE_RETRIEVAL_QUERY, docs → RETRIEVAL_DOCUMENT. Use when searching code with natural-language queries.

Use SEMANTIC_SIMILARITY (symmetric) only if you're doing pairwise sim, not retrieval. This module doesn't expose that path yet.

Model Notes

gemini-embedding-001 (GA since Feb 2026):

2,048 input token limit per text. Longer texts are truncated at ~8K chars (approximation).
Matryoshka (MRL) — 3072 native dims, safely truncatable to 1536/768/256/128.
3072 is auto-normalized; lower dims need client-side renorm (handled here).
Pricing: $0.15 / 1M input tokens. 135 medium paragraphs ≈ 15K tokens ≈ $0.002 per query.

gemini-embedding-2-preview (March 2026) is multimodal and currently top of MTEB. Set model="gemini-embedding-2-preview" to opt in once the preview stabilizes.

Limitations (v0.1.1)

No persistent index. Every call re-embeds the corpus. Fine for <~1K chunks; prohibitive for real knowledge bases. Phase 2: cache embeddings by content hash.
Token budget is approximated by char count (×1.5). Conservative for mixed-script text; over-truncates English slightly. Real tokenizer would use the Gemini tokenizer endpoint but costs an extra call per embed.
Batch bulk-failure diagnostic. If one text in a group of 100 overflows or is rejected by safety filters, the whole batch fails and the 99 good ones are lost. No per-index fallback yet.
No memory ceiling on corpus size. semantic_grep pre-allocates (N, dim) float32; 1M chunks at dim=768 ≈ 3GB. Caller is responsible for sane chunk counts. load_corpus also follows symlinks via rglob — fine in a trusted single-user container, not for untrusted paths.
Sequential batch groups. group_size=100 per HTTP call; groups run serially. For >1K chunks, add asyncio — not needed yet.
No CLI shim. Called as a Python module, not a subprocess. Per design: "within an LLM rather than calling out to one."
Embedding function lives here, not in invoking-gemini. Should be factored up when invoking-gemini adds embedding support. Tracked as followup.

Related Skills

invoking-gemini — sibling; handles Gemini text + image generation through the same CF gateway. Shares credential pattern.
searching-codebases — regex/AST search. Use first when the query is a known pattern.
extracting-keywords — YAKE keyword extraction; orthogonal, but pairs well for building query terms from a long prompt.
exploring-codebases — for understanding repo structure. Semantic-grep doesn't replace AST-based navigation.

Attribution

Conceptually inspired by jina-grep-cli — we kept the retrieval shape (grep-compatible output, asymmetric query/doc embeddings, threshold + top-k) but swapped the MLX/Apple-Silicon backend for a portable Gemini API call. The original's pipe-mode rerank pattern is the most generalizable idea it contributes and is preserved here.

ナビゲーション

Skillsとは？

リンク

semantic-grep

Semantic Grep

When Semantic Search Helps

Setup

Quick Start

Core API

`semantic_grep(query, corpus, *, top_k=10, threshold=None, ...)`

`load_corpus(path, , include=".txt", granularity="paragraph") -> list[Chunk]`

`embed_batch(texts, task_type, *, model, dim, group_size=100) -> np.ndarray`

`format_grep(matches, *, max_text_chars=200, show_score=True) -> str`

Pipe-mode Rerank Pattern

Task Types (Gemini)

Model Notes

Limitations (v0.1.1)

Related Skills

Attribution

関連スキル(🔧 開発ツール)

ナビゲーション

Skillsとは？

リンク

semantic-grep

Semantic Grep

When Semantic Search Helps

Setup

Quick Start

Core API

semantic_grep(query, corpus, *, top_k=10, threshold=None, ...)

load_corpus(path, *, include="*.txt", granularity="paragraph") -> list[Chunk]

embed_batch(texts, task_type, *, model, dim, group_size=100) -> np.ndarray

format_grep(matches, *, max_text_chars=200, show_score=True) -> str

Pipe-mode Rerank Pattern

Task Types (Gemini)

Model Notes

Limitations (v0.1.1)

Related Skills

Attribution

関連スキル(🔧 開発ツール)

`semantic_grep(query, corpus, *, top_k=10, threshold=None, ...)`

`load_corpus(path, , include=".txt", granularity="paragraph") -> list[Chunk]`

`embed_batch(texts, task_type, *, model, dim, group_size=100) -> np.ndarray`

`format_grep(matches, *, max_text_chars=200, show_score=True) -> str`