AGENTS.md — Magic Indexer (single-source onboarding)

If you are an AI assistant or a new contributor opening this repo for the first time, this file is your complete onboarding. Reading just this file should be enough to be useful. The other docs (docs/RUNBOOK.md, docs/reviews/, SECURITY.md, README.md) are deeper references; this file is the digest.

If, after reading this file, you are unsure about anything that isn't covered here, that is a documentation bug — flag it to the operator and they will tighten this file rather than re-explain.

What this is

Magic Indexer is an AT Protocol AppView server that ingests records from Jetstream + labels from ATProto labelers and exposes both via a dynamically-generated GraphQL API.

It is the hb-agent/magic-indexer fork of the hypercerts-org/hyperindex project. The compiled binary inside the container is named hypergoat, and the Go module path is github.com/GainForest/hypergoat. Both names are historical artefacts from when the project was originally called Hypergoat. Every command in this repo that mentions hypergoat or ./cmd/hypergoat is referring to that binary path — not a different product. Do not rename the module or the binary; it would touch ~80 files for a brand-only change.

The product name in user-facing documentation, configuration, deployments, and conversation is Magic Indexer.

Live deployment (the dev environment)

Item	Value
Public URL	`https://magic-indexer-dev.up.railway.app`
Public GraphQL	`https://magic-indexer-dev.up.railway.app/graphql`
GraphQL subscriptions	`wss://magic-indexer-dev.up.railway.app/graphql/ws`
Admin GraphQL	`https://magic-indexer-dev.up.railway.app/admin/graphql`
GraphiQL playground	`https://magic-indexer-dev.up.railway.app/graphiql`
GraphiQL admin	`https://magic-indexer-dev.up.railway.app/graphiql/admin`
Health	`https://magic-indexer-dev.up.railway.app/health`
Stats	`https://magic-indexer-dev.up.railway.app/stats`
Prometheus metrics	`https://magic-indexer-dev.up.railway.app/metrics`
Railway project ID	`7d6c4e52-de61-439f-96c0-3ded4114b9be`
Railway project name	`magic-index`
Railway environment	`dev`
Railway service	`magic-indexer`
Railway dashboard	`https://railway.com/project/7d6c4e52-de61-439f-96c0-3ded4114b9be`
GitHub repo	`https://github.com/hb-agent/magic-indexer`
Active branch	`per-labeler-definitions`
Backing database	Postgres 18, Railway-managed, in the same project
Admin UI	`https://magic-indexer-admin.vercel.app` (Next.js, confidential ATProto OAuth)
Currently ingesting from	Jetstream (24 lexicon-derived collections, including `app.certified.temp.graph.endorsement`)

The 24 collections currently being ingested all start with one of three NSID prefixes: org.hypercerts.*, app.certified.*, org.hyperboards.*. The app.certified.temp.graph.endorsement lexicon supports the trusted-evaluator feed filter (see docs/architecture/0001-trusted-evaluator-feed-filter.md). Lexicons are uploaded via the admin API from the npm package @hypercerts-org/lexicon (see Operations below).

Full-text search

All typed collection queries and the generic records query support a search: String parameter for full-text search across record content. The search uses Postgres tsvector with a GIN index for fast, stemmed queries.

Searched fields (weighted): title (A), shortDescription (B), description (C), workScope (D, string variant only).

Behavior: terms are space-separated and implicitly ANDed. English stemming is applied ("forest" matches "forests"). Special characters are stripped by plainto_tsquery — no injection risk. Max query length: 500 characters.

Example:

{ orgHypercertsClaimActivity(search: "forest conservation", first: 10, authors: ["did:plc:..."]) {
    edges { node { uri title shortDescription } }
    pageInfo { hasNextPage }
} }

Combinable with: authors, labels, excludeLabels, labelerDids.

Record validation

Records are validated against their lexicon schemas at two points:

Ingestion time (Jetstream + backfill): controlled by VALIDATION_MODE env var (disabled/warn/enforce, default disabled). In warn mode, invalid records are logged but stored. In enforce mode, they are skipped.
Query time (always on): SanitizeRecord() filters out records missing required fields, truncates over-long strings, and nulls invalid optional fields. This prevents NonNull propagation from killing entire query responses.

Safety rules — read these first

These are non-negotiable. Apply before doing anything that touches state.

Never commit secrets. SECRET_KEY_BASE, ADMIN_API_KEY, the Railway API token, OAuth signing keys, and .env files stay out of git. The repo's .gitignore excludes .env and .env.local. config.Validate() at startup rejects the literal development-secret-key-change-in-production-64chars placeholder so a misconfigured deploy fails fast instead of booting with a public key.
Never echo secrets in unredacted form in chat output, log files, or anything that could be persisted. When the operator pastes a secret to you, store it in /tmp/... with chmod 600 and reference the variable, don't reproduce the value.
Confirm with the operator before destructive actions. This includes (but isn't limited to): railway down, deleting Railway services, dropping the Postgres volume, force-pushing to per-labeler-definitions, deleting GitHub branches, mass issue closure, git reset --hard, git rebase -i, gh repo delete, dropping or truncating any database table, railway redeploy against a service whose latest commit you haven't built locally, rotating any secret without first confirming the operator has the new value, and any operation that affects the upstream hypercerts-org/hyperindex repo (this is a fork — your default scope is hb-agent/magic-indexer).
Read-then-act, not act-then-explain. When investigating a problem, read the relevant code with file:line evidence before proposing changes. Roughly half the "CRITICAL" findings in the 23 review rounds were false positives that disappeared after looking at the actual lines cited.
Quality gates before commit. No code change is "done" until all four pass:
```
go build ./...
go vet ./...
go test -race ./...
golangci-lint run ./...
```
CI also runs Postgres tests via TEST_DATABASE_URL and a reproducible-build diff job. Both should stay green.
Commit message convention. Each commit ends with a Co-Authored-By: trailer naming the model/agent that wrote it. The repo's recent history follows this; match the style.
git push is not optional. A change you didn't push is work that doesn't survive the session. The "Landing the plane" checklist at the bottom of this file is mandatory.

Browser automation in this dev container

This dev container has a working agent-browser install that controls a real headless Chromium. Use it when you need to verify something behaves correctly in a real browser — not just in SSR HTML, not just in a curl probe. Examples: client-side hydration errors, CORS rejections, React error boundaries, post-hydration data fetches, or "does the user actually see X on the page".

The two pieces that had to come together for this to work:

agent-browser CLI (npm package, native Rust): installed globally via npm install -g agent-browser. Version 0.25.3 or later.
Chromium binary: this dev container is Linux ARM64. Chrome for Testing has no ARM64 builds, so we use the Chromium that ships with Playwright instead. Install with npx --yes playwright@latest install chromium --with-deps. Lands at ~/.cache/ms-playwright/chromium-1217/chrome-linux/chrome.
Wrapper script at ~/.local/bin/ab that always passes --executable-path pointing at the Playwright Chromium so you never have to remember it. Use ab instead of agent-browser for everything in this repo.

If ab --version doesn't work in a fresh session, both the npm install and the Playwright Chromium download will need to be re-run, then drop the wrapper back in:

npm install -g agent-browser
npx --yes playwright@latest install chromium --with-deps
mkdir -p ~/.local/bin
cat > ~/.local/bin/ab <<'EOF'
#!/usr/bin/env bash
exec agent-browser --executable-path "$HOME/.cache/ms-playwright/chromium-1217/chrome-linux/chrome" "$@"
EOF
chmod +x ~/.local/bin/ab

The chromium directory name (chromium-1217) is the Playwright revision number and may differ on a fresh install. Update the wrapper if needed.

Common usage

ab open https://magic-indexer-dev.up.railway.app/graphiql
ab snapshot                       # accessibility tree with refs (best for AI)
ab screenshot /tmp/page.png       # raster image
ab eval '<javascript>'            # run JS in the page context
ab click @e10                     # click element by ref from snapshot
ab fill @e3 "search term"         # fill an input
ab close                          # close session

What it caught last session

The integration test of certs-social → magic-indexer produced the right SSR HTML, the right Vercel build, the right TypeScript, and a passing npm run build — but the live page in a real browser showed Something went wrong / Failed to fetch because the magic-indexer CORS allowlist didn't include the Vercel preview URL. Caught only because ab open + ab snapshot exposed the post-hydration error state. None of the static checks would have found it.

What it can't do

ab controls a headless browser. It can't:

Watch you click around interactively (use your own browser).
Step through React DevTools.
Show you the same console output you'd see in Chrome DevTools in detail (use ab eval 'console errors are evaluated as JS' workarounds, or open the page in your own browser for interactive debugging).

For deep interactive debugging, your local browser is still the right tool. ab is for "verify the live deployment renders correctly without me having to open a browser tab."

Self-test for a fresh session

After you've read this file, test your understanding by mentally answering these. If you can answer all of them without re-reading, you're oriented:

What is the project called, and what does it do in one sentence?
Where is it deployed, on what platform, with what backing store?
What's currently in the database (records, actors, lexicons, labelers)?
What's the active branch and the last commit on it?
Which two issues are deliberately deferred and why?
What two facts about lexicons do you have to remember when uploading them?
What is the one command an operator runs to deploy a code change?
What's the difference between RAILWAY_TOKEN and RAILWAY_API_TOKEN?
What's the rule about the VOLUME keyword in Dockerfile?
Why is OptionalAuth middleware permissive on bad bearer tokens, and why is that not a security hole?

Answers are scattered through the rest of this file. If any of the questions don't have a clear answer here, that's a documentation bug — say so.

Build, test, lint (local development)

From a clean checkout:

git clone https://github.com/hb-agent/magic-indexer.git
cd magic-indexer
git checkout per-labeler-definitions
make setup           # generates .env with a fresh SECRET_KEY_BASE
go run ./cmd/hypergoat

Quality gates that must pass before any commit:

go build ./...                   # also: make build
go vet ./...
go test ./...                    # also: make test (adds -race)
go test -race ./...
golangci-lint run ./...          # also: make lint

Single test patterns:

go test -v -run TestParseLexicon ./internal/lexicon/...
go test -v ./internal/graphql/admin/...

Coverage report:

make test-coverage               # writes coverage.html

To run the integration test suite (build tag integration):

go test -tags=integration ./internal/integration/...

CI runs all of the above on every push to main and every PR targeting main, against both SQLite and Postgres (TEST_DATABASE_URL), plus a reproducible-build diff job. See .github/workflows/ci.yml.

Code style

Imports

Three groups, blank lines between them, in this order:

import (
    "context"           // 1. Standard library
    "fmt"

    "github.com/go-chi/chi/v5"  // 2. External packages

    "github.com/GainForest/hypergoat/internal/database"  // 3. Internal packages
)

Package documentation

Every package has a doc comment on package:

// Package config handles configuration loading from environment variables.
package config

Naming

Packages: lowercase, single word (lexicon, oauth, backfill).
Files: lowercase with underscores (did_resolver.go).
Types: PascalCase (Executor, RecordFetcher).
Interfaces: noun or -er suffix (Executor, Fetcher).
Acronyms: all caps (URI, DID, HTTP, JSON).

Errors

Always wrap with context, prefer %w:

if err != nil {
    return fmt.Errorf("failed to query records: %w", err)
}

For OAuth-style validation errors prefer package-level sentinel vars (see internal/oauth/dpop.go for the canonical pattern that came out of review Round 8).

Context

Always pass ctx as the first parameter to any I/O method:

func (r *RecordsRepository) GetByURI(ctx context.Context, uri string) (*Record, error)

Repository pattern

Database access lives in internal/database/repositories/. Constructors take a database.Executor. SQL is built with r.db.Placeholder(n) for dialect-aware parameters. Every method takes ctx. Don't reach into the executor from outside the repositories layer.

Logging

Use log/slog everywhere. Structured fields, never string interpolation:

slog.Info("Starting backfill", "collections", collections, "count", len(repos))
slog.Warn("Failed to resolve DID", "did", did, "error", err)
slog.Error("Database connection failed", "error", err)

The mutation log line in the admin handler logs variable keys, not values — never reintroduce value logging without an audit, it's a log-injection vector that Round 3 caught and fixed.

Testing

Table-driven tests, fresh setup per test, no shared state:

func TestSomething(t *testing.T) {
    tests := []struct {
        name    string
        input   string
        wantErr bool
    }{
        {"happy path", "valid", false},
        {"bad input",  "junk",  true},
    }
    for _, tc := range tests {
        t.Run(tc.name, func(t *testing.T) {
            _, err := DoThing(tc.input)
            if (err != nil) != tc.wantErr {
                t.Errorf("err = %v, wantErr %v", err, tc.wantErr)
            }
        })
    }
}

For DB tests, use testutil.SetupTestDB(t) which honours TEST_DATABASE_URL and falls back to in-memory SQLite.

Project structure

cmd/hypergoat/          # Main entry point (server init, routing, lifecycle)
internal/
  backfill/             # Historical record backfill from AT Protocol relays
  config/               # Configuration loading from environment + Validate()
  consumer/             # Shared reconnection backoff (RunWithReconnect) used by jetstream, labeler, tap
  cursor/               # Shared cursor persistence (atomic.Int64 Flusher) used by all consumers
  database/
    migrations/         # SQL migrations (auto-run on startup, transactional + "-- no-transaction" sentinel for CONCURRENTLY)
    repositories/       # Data access layer (records, labels, label_definitions, filter.go for field filters, etc.)
    sqlite/             # SQLite implementation (pure Go, modernc) — note: removed from production, Postgres-only
    postgres/           # PostgreSQL implementation (pgx)
  graphql/
    admin/              # Admin API (POST-only, bearer-or-OAuth gated) — createFieldIndex, dropFieldIndex, uploadLexicons, etc.
    schema/             # Public schema builder (dynamic from lexicons) + where.go for field filter extraction
    resolver/           # Public resolver wiring + repository context injection
    query/              # Connection types (Relay spec) + ClampPageSize + SortDirectionEnum
    depth/              # Pre-execution GraphQL query depth guard (max 15 / 20)
    subscription/       # WebSocket subscriptions (graphql-transport-ws)
    types/              # GraphQL type mapping from lexicon definitions + filters.go (per-type FilterInput types)
  ingestion/            # Shared RecordProcessor: ensure-actor → insert/delete → log-activity → publish-to-pubsub
  integration/          # Integration tests (build tag: integration)
  jetstream/            # Real-time AT Protocol event consumer (delegates to ingestion.RecordProcessor)
  labeler/              # ATProto labeler subscribeLabels + queryLabels client
  lexicon/              # Lexicon parsing, registry, NSID utilities
  metrics/              # Prometheus counters + /metrics HTTP handler
  notifications/        # Bluesky-pattern notifications: per-collection extractors, aggregation, seen watermark
  notifications/extractors/  # Per-collection notifier implementations (endorsement, activity-contributor)
  oauth/                # OAuth 2.0 + DPoP + PKCE + did:plc / did:web resolution
  server/               # HTTP handlers, security headers, CORS, GraphiQL UI
  tap/                  # Tap sidecar consumer (crypto-verified events, ack-based delivery) — alternative to Jetstream via TAP_ENABLED
  workers/              # Background jobs (activity cleanup + orphan janitor, etc.)
docs/                   # RUNBOOK + reviews + plans
scripts/                # Deployment helpers (setup-env.sh)
testdata/               # Test fixtures and sample lexicons

Subsystem highlights (the things that bit us in review)

Public GraphQL `authors` filter

Typed collection queries accept an authors: [String!] argument to filter by author DID. Cap: 500 DIDs per query. An empty list means "no filter" (returns all authors). Example: orgHypercertsClaimActivity(first: 10, authors: ["did:plc:..."]).

Field filter system (`internal/database/repositories/filter.go` + `internal/graphql/schema/where.go`)

Typed collection queries also accept a where argument with per-field operators generated from the lexicon's scalar properties:

Operators: eq, neq, gt, lt, gte, lte, in, contains, startsWith, isNull.
eq uses json @> $::jsonb containment (hits the GIN jsonb_path_ops index). Other operators use json->>'field' extraction (seq scan unless an expression index exists — see admin mutations below).
neq semantically means "not equal OR field absent" (includes NULLs).
contains min 3 chars; startsWith min 1 char. Both escape \, %, _ via ESCAPE '\\'.
in uses = ANY($::text[]) — single array param instead of expanded IN (...).
Nested paths via __ separator (e.g., metadata__source → json->'metadata'->>'source'). Max 3 nesting levels. Auto-generating nested WhereInput fields from lexicons is deferred (issue #40); SQL layer supports it.

Composition via _and / _or fields on WhereInput (recursive, self-referential via graphql-go AddFieldConfig):

FilterGroup tree with GroupAND/GroupOR operators.
BuildFilterGroupClause is the recursive SQL builder; proper parenthesization; global condition count capped at MaxFilterConditions (20) across the whole tree; max depth MaxFilterDepth (3).
Field name validation ([a-zA-Z_][a-zA-Z0-9_]* per segment) runs before any string interpolation into SQL — this is defense-in-depth; names come from the lexicon registry, not user input.

Sort-aware keyset pagination

orderBy (string, field name) and orderDirection (ASC/DESC, default DESC) arguments on typed collection queries. The repository layer now honors these: the ORDER BY clause uses SortOption.BuildSortExpr() and the keyset cursor comparison uses the sort expression (previously always indexed_at).

Direct columns (indexed_at, uri, did, collection, cid, rkey) use the column name; anything else becomes json->>'field' (with nested path support via __).
NULLS LAST in ORDER BY for both ASC and DESC.
URI tiebreaker appended in the same direction.
Fast-path guard: when no filters/labels/search apply, the function delegates to GetByCollectionWithKeysetCursor which always sorts by indexed_at DESC. The hasCustomSort check (PR #50) prevents that path from silently ignoring a custom orderBy on an unfiltered query.
Multi-column sort (orderBy as list) is deferred (issue #39).

Cursor format (V2)

Cursors are base64-URL-encoded JSON arrays: ["sortField", "sortValue", "uri"]. The decoder also accepts the legacy pipe-delimited format ("timestamp|uri") for backward compatibility; legacy cursors only work when orderBy is indexed_at (default). Sort-field mismatch produces a clear error.

Backward pagination

last + before arguments complement first + after. Mixed forward + backward is rejected. Implementation: flip the sort direction + cursor comparison, fetch last+1, reverse the slice in memory. hasPreviousPage is true when we fetched more than last; hasNextPage for backward mode reflects whether items exist after the returned window.

Admin expression index mutations

createFieldIndex(collection, field) and dropFieldIndex(collection, field) on the admin GraphQL API. Generates: CREATE INDEX CONCURRENTLY ON record ((json->>'field')) WHERE collection = 'nsid'. Partial index (filtered by collection) keeps size small. Runs outside a transaction via the migration runner's -- no-transaction sentinel convention. Use this to accelerate comparison/pattern filters that the GIN index can't serve.

Shared consumer infrastructure (`internal/ingestion`, `internal/cursor`, `internal/consumer`)

Extracted from the original inline Jetstream consumer during the hyperindex port:

ingestion.RecordProcessor — ensure-actor → insert/delete → log-activity → publish-to-pubsub. Used by both Jetstream and Tap consumers. Enforces an optional collection allowlist and rejects non-object JSON records.
cursor.Flusher — atomic.Int64 cursor value + ticker-based flush, skip-on-idle. Survives context cancellation via a bounded final flush.
consumer.RunWithReconnect — exponential backoff (1s → 2min, reset after 30s of stable connection).

Tap consumer (`internal/tap/`)

Alternative to Jetstream when TAP_ENABLED=true. Consumes crypto-verified events from the Bluesky Tap sidecar with ack-based delivery and per-repo ordering. Synchronous dispatch (backpressure via the WebSocket itself is the correct signal for ack-based protocols). Panic-recovered, exponential retry (1s/2s/4s) per event, then skip. Connection / Dialer interfaces abstract gorilla/websocket for testability. Trust boundary: Tap verifies MST inclusion proofs but not signing key vs DID document (#41 deferred).

Notifications subsystem (`internal/notifications/`)

Bluesky-pattern notification system. Enabled via NOTIFICATIONS_ENABLED=true.

Data model: notification (envelope, one row per displayed notification, optionally aggregated by group_key), notification_participant (one row per source record that contributed — unique on (record_uri, recipient_did) for idempotent replay and correct tombstone cascade), actor_state (per-user seen watermark, same as Bluesky).
Hook: registered as a RecordHook on RecordProcessor.RecordHooks, policy HookLogContinue. A malformed record cannot stall firehose ingestion — hook errors are logged but don't abort the record insert. Panic-recovered per invocation. Runs on insert/update/delete.
Extractors (internal/notifications/extractors/): one Go file per collection. Currently: endorsement (aggregates on subject URI) and activity-contributor (non-aggregating, fans out per contributor DID up to MaxFanOutPerRecord=100).
Idempotency: the participant table's UNIQUE (record_uri, recipient_did) is the replay boundary. Re-processing the same record is a no-op.
Tombstone cascade: record delete → DeleteByRecordURI removes participants, decrements envelope count, deletes the envelope at count 0, recomputes latest_* from remaining participants when the removed participant was the latest.
Update path: delete-then-re-extract, to handle activity contributor list changes correctly.
Defense-in-depth: isValidDID syntactic validation, clampSortAt bounds timestamps to [now-7d, now], MaxReasonSubjectBytes caps subject URIs, MaxContributorsBeforeReject short-circuits oversized records via a shallow JSON scan before full unmarshal.
GraphQL (admin endpoint): notifications(did, reasons, first, after), unreadNotificationCount(did) (capped at 50+), updateNotificationsSeen(did, seenAt). Fields are merged into the admin schema via admin.WithExtraQueries and admin.WithExtraMutations options — no cyclic import between packages.
Cursor V1 for notifications: base64-URL JSON ["v1:notif", sort_at_iso, id].
Trust boundary: public /graphql is unauthenticated, so notifications live on the admin endpoint and accept did as an argument. The certs-social proxy is the trust boundary (resolves session DID and forwards it). Public- endpoint migration is deferred until OAuth auth lands on /graphql.

Labeler subsystem (`internal/labeler/`)

Mirrors internal/jetstream/ but speaks the ATProto labeler protocol:

client.go — websocket client for com.atproto.label.subscribeLabels. Uses fxamacker/cbor/v2 for the two-CBOR-object frame format (#labels, #info, #error). SetReadLimit bounds frame size. Non-normal close codes are surfaced at Warn; empty-body #labels frames are dropped explicitly; #info decode failures are elevated to Warn so OutdatedCursor signals cannot be silently lost.
backfill.go — one-time com.atproto.label.queryLabels paginated backfill via hashicorp/go-retryablehttp.
consumer.go — lifecycle: load cursor → backfill if needed → connect → stream labels → flush cursor on a ticker. Exponential backoff on reconnect. Panic-recovered at the goroutine boundary in cmd/hypergoat/main.go so one labeler cannot take down the process. Logs cursor gaps at Warn.

Label definitions are auto-upserted via INSERT ... ON CONFLICT DO NOTHING keyed on the composite (src, val) PK from migration 009 — concurrent labelers cannot race a new (src, val) pair.

Jetstream consumer (`internal/jetstream/`)

client.go — websocket client for the Jetstream firehose. SetReadLimit(8 MiB) bounds per-frame memory.
consumer.go — lifecycle + reconnect loop. The cursor is persisted to the config table every 5 s by default. Critical invariant from Round 14: c.cursorDone, c.config, and c.ctxCancel must all be mutated under clientMu; the Start() reconnect loop and UpdateCollections() both take the lock around their state writes.
The lexicon change callback dynamically restarts the Jetstream consumer with a fresh wantedCollections list whenever lexicons are uploaded via the admin API. No process restart needed.

Security headers middleware (`internal/server/security_headers.go`)

Emits X-Content-Type-Options: nosniff, X-Frame-Options: DENY, Referrer-Policy: no-referrer, and conditionally Strict-Transport-Security (only when EXTERNAL_BASE_URL is https://). /graphiql sets its own Content-Security-Policy allowing the unpkg CDN for bootstrap assets; JSON API endpoints keep the tighter default.

`OptionalAuth` middleware (`internal/oauth/middleware.go`)

Important contract: when the Authorization header is present but fails OAuth token validation, OptionalAuth passes through with no user context — it does not return 401. This is required because /admin/graphql is mounted with OptionalAuth and the admin handler accepts two auth schemes: a validated OAuth user, and an ADMIN_API_KEY bearer token. Returning 401 in the middleware on a non-OAuth bearer (like the admin API key) would prevent the API-key path from ever being reached. The admin handler does its own 401 check on empty userDID + no API key, so security posture is unchanged. Round 8 introduced this behaviour live during the first deploy when admin auth via ADMIN_API_KEY returned invalid_token.

Activity log empty-event-json normalisation (`internal/database/repositories/jetstream_activity.go`)

The Jetstream consumer passes string(commit.Record) into LogActivity. For delete operations commit.Record is nil and the result is "". Postgres JSONB NOT NULL rejects empty strings; SQLite stores them loosely as TEXT. The repository normalises empty / whitespace-only payloads to the JSON literal null so both dialects accept the row. Discovered live during the Railway deploy.

`config.Validate()` (`internal/config/config.go`)

Refuses to start if SECRET_KEY_BASE is shorter than 64 bytes or matches the literal development-secret-key-change-in-production-64chars placeholder.
Refuses to start on out-of-range PORT.
Logs a Warn (not silent fallback) when getEnvInt is given a malformed integer value.

Migrations (`internal/database/migrations/`)

Each migration's UpSQL and the schema_migrations insert run inside a single transaction (applyMigrationTx). A crash in the middle leaves both rolled back.
Rollback follows the same pattern.
Migrations 001–009 are present in both sqlite/ and postgres/ variants and are tested for round-trip equivalence.

Repositories that touch labels

LabelsRepository — Insert / InsertNegation use ON CONFLICT DO NOTHING keyed on a partial unique index per migration 007. Active-set queries (GetByURIs, HasTakedown, GetTakedownURIs, plus the records label-filter subquery) all filter expired labels via (l.exp IS NULL OR l.exp > nowLiteral()).
LabelDefinitionsRepository — composite (src, val) primary key from migration 009 so two labelers can both define high-quality with different semantics.
OAuthDPoPJTIRepository.InsertIfNew — atomic INSERT ... ON CONFLICT DO NOTHING for race-safe DPoP replay detection.

Deploying — the short version

The full deploy playbook (first-time provisioning, lexicon upload, secret rotation, common gotchas) is in docs/RUNBOOK.md. Read that before touching the live environment for anything beyond a routine code redeploy.

Routine code deploy

cd /path/to/magic-indexer
git checkout per-labeler-definitions
git pull
export RAILWAY_API_TOKEN='<from-password-manager>'
railway up --service magic-indexer --detach
railway logs --service magic-indexer --deployment --lines 100

Watch a deploy

railway logs --service magic-indexer --build         # build phase
railway logs --service magic-indexer --deployment    # runtime

Railway gotchas (the things that broke our first deploys)

VOLUME is banned in Dockerfiles. Railway rejects any VOLUME instruction. Don't reintroduce it. Use Railway's native volume mechanism via the dashboard if you need persistent storage.
Use RAILWAY_API_TOKEN for account-scoped tokens, not RAILWAY_TOKEN. RAILWAY_TOKEN is for project-scoped tokens. Whoami fails silently with the wrong variable.
railway add --database postgres shows a prompt that looks like a hang but the service is created anyway. Don't double- run; you'll get duplicate Postgres services. If you do, delete the duplicate via the GraphQL API: mutation { serviceDelete(id: "<dup-id>") }.
railway variables --set NAME= (empty value) is rejected by the CLI. To clear a variable, use the GraphQL API: mutation { variableUpsert(input: { ..., value: "" }) }.
HSTS only emits when EXTERNAL_BASE_URL starts with https://. A deployed instance with http:// in that env var will not send HSTS. By design.
Railway auto-discovers + exposes ${{Postgres.DATABASE_URL}} variable references at runtime; use that form, not the raw resolved URL, so a Postgres credential rotation propagates automatically.

Operations — the short version

The full operator playbook is in docs/RUNBOOK.md. Two essential rules to remember:

Lexicons come from npm, never from main

The canonical source for hypercerts/certified/hyperboards lexicons is the npm package @hypercerts-org/lexicon. Do not read from the upstream hypercerts-org/hypercerts-lexicon main branch directly. The README of that repo says so explicitly, and there's a good reason: main is unstable and contains work-in-progress schema changes that may be broken or incompatible. The npm package is the versioned, tested distribution.

To upload lexicons matching a set of NSID prefixes:

# 1. Resolve latest version
VERSION=$(curl -s https://registry.npmjs.org/@hypercerts-org/lexicon \
  | python3 -c "import json,sys; print(json.load(sys.stdin)['dist-tags']['latest'])")

# 2. Download the tarball
curl -sL "https://registry.npmjs.org/@hypercerts-org/lexicon/-/lexicon-$VERSION.tgz" \
  -o /tmp/lexicon.tgz
mkdir -p /tmp/lexicon-pkg && tar -xzf /tmp/lexicon.tgz -C /tmp/lexicon-pkg

# 3. Filter to the prefixes you want
cd /tmp/lexicon-pkg
mkdir -p upload-staging
find package/lexicons -name "*.json" | while read f; do
  id=$(python3 -c "import json; print(json.load(open('$f'))['id'])")
  case "$id" in
    org.hypercerts.*|app.certified.*|org.hyperboards.*)
      rel=${f#package/lexicons/}
      mkdir -p "upload-staging/$(dirname "$rel")"
      cp "$f" "upload-staging/$rel"
      ;;
  esac
done

# 4. Zip + base64
( cd upload-staging && zip -r ../lexicons.zip . )
base64 -w0 lexicons.zip > lexicons.zip.b64

# 5. Upload via admin GraphQL
ADMIN_API_KEY='<from-password-manager>'
ADMIN_DID='did:plc:<your-did>'
python3 -c "
import json
print(json.dumps({
  'query': 'mutation Upload(\$zip: String!) { uploadLexicons(zipBase64: \$zip) }',
  'variables': {'zip': open('lexicons.zip.b64').read().strip()}
}))" > upload-payload.json

curl -X POST https://magic-indexer-dev.up.railway.app/admin/graphql \
  -H "Authorization: Bearer $ADMIN_API_KEY" \
  -H "X-User-DID: $ADMIN_DID" \
  -H "Content-Type: application/json" \
  --data-binary @upload-payload.json
# expected: {"data":{"uploadLexicons":<count>}}

After upload, the Jetstream consumer automatically restarts with the new union of wantedCollections. No human action needed.

Labeler enable / disable / pause

# Enable: comma-separated DIDs
railway variables --service magic-indexer \
  --set "LABELER_DIDS=did:plc:abc...,did:plc:def..."

# Disable all: empty string via GraphQL (CLI doesn't allow empty values)
curl -X POST https://backboard.railway.com/graphql/v2 \
  -H "Authorization: Bearer $RAILWAY_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"query":"mutation { variableUpsert(input: { projectId: \"7d6c4e52-de61-439f-96c0-3ded4114b9be\", environmentId: \"<env-id>\", serviceId: \"<service-id>\", name: \"LABELER_DIDS\", value: \"\" }) }"}'
railway redeploy --service magic-indexer --yes

# Pause one labeler without restart (admin endpoint)
curl -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \
  "https://magic-indexer-dev.up.railway.app/admin/labeler/pause?did=did:plc:..."

# Reset cursor (force re-backfill on next start)
curl -X POST -H "Authorization: Bearer $ADMIN_API_KEY" \
  "https://magic-indexer-dev.up.railway.app/admin/labeler/reset?did=did:plc:..."

Diagnose "why is this record hidden?"

curl -H "Authorization: Bearer $ADMIN_API_KEY" \
  "https://magic-indexer-dev.up.railway.app/admin/label-chain?uri=at://did:plc:abc/app.bsky.feed.post/xyz"

Returns every label on the URI (active, negated, expired) with provenance. Bypasses the public query path's filters because this is a diagnostic view.

Common labeler failure modes

Symptom	Likely cause
`dial labeler: connection refused`	Labeler's PLC entry points at a non-public host (e.g. `http://localhost:4100`). Operator must update DID doc.
`dial labeler: websocket: bad handshake`	Labeler host serves queryLabels HTTP but not subscribeLabels WS. Backfill works, live stream doesn't.
`Labeler backfill complete ... received=0`	No labels published under this DID's `src`. Either new labeler with no data, or wrong DID.
Cursor gap warning in logs	Labeler dropped frames upstream. Indexer keeps going.

The reconnect loop uses exponential backoff (1 s → 2 min cap), so a permanently-broken labeler settles to one log line per two minutes per labeler.

Review history (so you don't waste a session)

This branch went through 23 rounds of overnight review producing 59 fixes and 3 regression tests before the first deploy. The per-round logs and final reports are in docs/reviews/. Read the index there if you suspect something has already been audited.

A comprehensive security audit was performed on 2026-04-13 (see docs/AUDIT_REPORT_2026-04-13.md). It identified 29 findings (4 Critical, 5 High, 8 Medium) and fixed 15 of them across 14 commits. The remaining items are low-severity or require architectural changes.

Combined totals:

Rounds	Reviewers	Critical	Major	Minor	Nice	Fixed
1–10	200	35	100	95	19	55 fixes + 3 regression tests
11–18	160	2	1	0	0	3 fixes (jetstream state races, Round 14)
19–23	100	0	0	0	0	1 mid-deploy fix (`OptionalAuth` pass-through)
Audit	10+	4	5	8	12	15 fixes across 14 commits
total	470+	41	106	103	31	74 fixes + 3 regression tests

Items deliberately deferred (do not re-discover)

Two open issues are deliberate deferrals. Both have full rationale + design questions documented as comments on the GitHub issue. Read the comments before proposing work in either area.

#10 — Labeler signature verification. Re-open when a labeler we ingest starts shipping cryptographic signatures against a stable scheme.
#13 — GDPR hard-delete endpoint. Re-open when there's a real erasure request or a legal obligation.
#57 Deferred hardening (service-auth for notifications). The April-2026 PR landed the verifier, /notifications/graphql endpoint, and .well-known/atproto-did handler. What it did NOT land: per-iss/IP resolver throttle, negative cache, serve-stale on PLC outage, bad-signature key-rotation retry, caller_hash metric label, persistent jti store. Sentinels and metric helpers are already in place; wire them when (a) we see real abuse, (b) we add a second replica, or (c) the plan file (/workspace/issue-57-plan.md) says we're ready.
#26 Deploy 2 (sortAt exposure). The April-2026 bundled PR shipped Deploy 1: migration 017 + ingestion.ComputeSortAt writes sort_at on every new insert. Deploy 2 — backfill existing rows, NOT NULL flip, sortAt GraphQL field, and ORDER BY COALESCE(sort_at, indexed_at) queries — stays deferred until Deploy 1 has been live long enough that the NULL tail is small. Check SELECT count(*) FROM record WHERE sort_at IS NULL before scheduling it.

Other things that came up in review and were intentionally not changed:

The Go module path stays github.com/GainForest/hypergoat and the binary stays hypergoat. Renaming would touch ~80 files for a brand-only change. The product is "Magic Indexer" in docs, the binary is hypergoat on disk.
Takedown is opt-in. A record with an active !takedown label is not hidden by default. Clients must pass excludeLabels: ["!takedown"] explicitly. This is a deliberate product decision (the indexer is labeler-neutral).

Landing the plane (mandatory checklist when ending a session)

Quality gates (if code changed):
```
go build ./...
go vet ./...
go test -race ./...
golangci-lint run ./...
```
All four green or you have a real reason for the failure that's documented in the commit message.

Commit with the right convention:

git add -A
git commit -m "<area>: <one-line summary>

<body>

Co-Authored-By: <your model name> <noreply@anthropic.com>"

Refer to closed issues with Closes #N where applicable.

Push:

git push origin per-labeler-definitions
git status                    # MUST show "up to date with origin"

Verify: nothing that affects the live deployment is considered "shipped" until railway up && /health → 200 and the relevant log line you expected is visible in railway logs --service magic-indexer --deployment.
Don't leave secrets in /tmp if the session is ending without recovery. shred -u /tmp/<file> or leave them in place if the operator will be back to use them.

Never stop before pushing. Local-only work is work that doesn't survive.

ナビゲーション

Skillsとは？

リンク

AGENTS.md — Magic Indexer (single-source onboarding)

AGENTS.md — Magic Indexer (single-source onboarding)

What this is

Live deployment (the dev environment)

Full-text search

Record validation

Safety rules — read these first

Browser automation in this dev container

Common usage

What it caught last session

What it can't do

Self-test for a fresh session

Build, test, lint (local development)

Code style

Imports

Package documentation

Naming

Errors

Context

Repository pattern

Logging

Testing

Project structure

Subsystem highlights (the things that bit us in review)

Public GraphQL authors filter

Field filter system (internal/database/repositories/filter.go + internal/graphql/schema/where.go)

Sort-aware keyset pagination

Cursor format (V2)

Backward pagination

Admin expression index mutations

Shared consumer infrastructure (internal/ingestion, internal/cursor, internal/consumer)

Tap consumer (internal/tap/)

Notifications subsystem (internal/notifications/)

Labeler subsystem (internal/labeler/)

Jetstream consumer (internal/jetstream/)

Security headers middleware (internal/server/security_headers.go)

OptionalAuth middleware (internal/oauth/middleware.go)

Activity log empty-event-json normalisation (internal/database/repositories/jetstream_activity.go)

config.Validate() (internal/config/config.go)

Migrations (internal/database/migrations/)

Repositories that touch labels

Deploying — the short version

Routine code deploy

Watch a deploy

Railway gotchas (the things that broke our first deploys)

Operations — the short version

Lexicons come from npm, never from main

Labeler enable / disable / pause

Diagnose "why is this record hidden?"

Common labeler failure modes

Review history (so you don't waste a session)

Items deliberately deferred (do not re-discover)

Landing the plane (mandatory checklist when ending a session)

See also (for the things this digest abbreviates)

関連スキル(🔧 開発ツール)

Public GraphQL `authors` filter

Field filter system (`internal/database/repositories/filter.go` + `internal/graphql/schema/where.go`)

Shared consumer infrastructure (`internal/ingestion`, `internal/cursor`, `internal/consumer`)

Tap consumer (`internal/tap/`)

Notifications subsystem (`internal/notifications/`)

Labeler subsystem (`internal/labeler/`)

Jetstream consumer (`internal/jetstream/`)

Security headers middleware (`internal/server/security_headers.go`)

`OptionalAuth` middleware (`internal/oauth/middleware.go`)

Activity log empty-event-json normalisation (`internal/database/repositories/jetstream_activity.go`)

`config.Validate()` (`internal/config/config.go`)

Migrations (`internal/database/migrations/`)