Architecture rules (non-negotiable)
- Typed contracts are non-negotiable. Never widen a contract field to
dict[str, Any]orAny. Every module output is a typed PydanticBaseModel. - No fallback evaluators. Unmapped criteria surface as
UNMAPPEDinCriterionEvaluation.result. Do not add a default/catch-all evaluator. - Each review-data field is written from exactly one source. The assembler maps one module → one field. No merging, no fan-in.
- Evaluator technique is private. Rule, LLM, or hybrid — all implement the same
BaseEvaluator.evaluate()interface. The tree evaluator does not care how the decision was made. required_fieldsis a contract. List the field path(s) onBaseReviewDatathat your evaluator needs. The tree evaluator enforces non-None before callingevaluate(). Do not add None-guards inside evaluators for required fields.
Key architectural boundaries
BaseExtractionModule[TOutput]— typed generic;output_schemais resolved at class-definition time via__init_subclass__. Always parametrize with a concreteBaseModelsubclass.@register_evaluator(*criterion_codes)— exact-match registry; one evaluator per criterion code; raises on duplicate. Registration triggers at Django app-ready time viaevaluation/evaluators/__init__.py.required_fieldscontract — declared asClassVar[list[str]]onBaseEvaluator. The tree evaluator checks each dotted path againstdatabefore calling the evaluator.Noneat any point in the path = INSUFFICIENT_INFO.- No
dict[str, Any]— enforced everywhere: module contracts, review data, assembler, evaluator inputs. Usemodel_dump()/model_validate()at DB boundaries only.
## LLM determinism
`temperature=0` helps but does not guarantee identical output across API versions. Cached fixtures in `fixtures/cached_llm_responses/` paper over this for tests. Do not delete or hand-edit those files — regenerate them via `scripts/generate_fixtures.py`.
The provider is env-driven (`LLM_PROVIDER`): anthropic (default), gemini, openai, groq, openrouter. Prompts must produce strict JSON that parses cleanly across providers — the contract is the Pydantic schema, not any one provider's quirks. All provider paths pass `temperature=0`.
## Cache key lifecycle
Cache keys are derived from the *data*, not the prompt. The extraction cache key now includes a prompt hash so prompt edits miss the cache cleanly — but evaluator cache keys do NOT include the evaluator version. If you change an evaluator's system prompt, bump the `cache_key_prefix` OR delete the cached file by hand OR re-run `scripts/generate_fixtures.py`.
### Provider-aware cache layout
Cache files live under `fixtures/cached_llm_responses/`. When `LLM_PROVIDER` is set, the client scopes reads/writes into a provider subdirectory:
- Read order (`LLM_MODE=cache`): `fixtures/cached_llm_responses/<provider>/<key>.json`, falling back to `fixtures/cached_llm_responses/<key>.json` (the shared baseline shipped in the repo). A total miss raises `LLMCacheMiss` naming both paths.
- Write location (`LLM_MODE=record`): `fixtures/cached_llm_responses/<provider>/<key>.json` when `LLM_PROVIDER` is set, else the shared top-level path. This keeps provider-specific recordings from clobbering the shared baseline.
- If `LLM_PROVIDER` is unset, behavior matches the pre-existing shared layout exactly.
## Running tests
```bash
make test
Tests run with LLM_MODE=cache by default (set in tests/conftest.py). They never hit the network.