Classifier-Style Agents
Use this reference for agents that extract structure, rank candidates, classify actions, route work, moderate content, or gate automation.
Review Sequence
- Load the domain contract before reviewing prompt wording.
- Separate deterministic stages from model-driven stages.
- Check retrieval or candidate quality before blaming the model.
- Check thresholds, post-model validation, and human-review boundaries.
- Require slice-based evals before recommending larger architecture changes.
If the system has no written schema, taxonomy, or action set, flag that gap first.
Common Failure Types
- false positive final action
- false negative or missed match
- wrong abstain or over-escalation
- bad candidate set or missing evidence
- schema drift or normalization mismatch
- overconfident outputs on weak evidence
- threshold regression after prompt changes
- cost or latency regressions from unnecessary tool use
Design Rules
- Keep the action set explicit and mutually exclusive.
- Add an abstain,
no_match, or manual-review path when evidence can be weak. - Keep candidate sets structured and comparable on decisive fields.
- Bias the runtime against costly false positives.
- Make confidence meaningful only if it drives thresholds or downstream policy.
- Validate model outputs after generation, not only in the prompt.
Bottleneck Clues
| Symptom | Likely bottleneck |
|---|---|
| The model chooses the wrong item from a good candidate set | Decision policy, prompt clarity, tool descriptions, or calibration |
| The right item is not present when the model decides | Retrieval, normalization, or candidate generation |
| Model output looks good but persisted state is wrong | Output schema, validation, or integration bugs |
| Confidence is high on weak evidence | Thresholding, confidence semantics, or missing abstain rules |
| Queue load spikes after an "accuracy" change | Threshold regression, auto-action policy, or precision/recall imbalance |
Repo Anchor: Peated Bottle Matcher
When the task is about the current bottle matcher or label extractor, read:
docs/development/schema-conventions.mdapps/server/src/agents/whisky/guidance.tsapps/server/src/agents/priceMatch/classifyStorePriceMatch.tsapps/server/src/lib/priceMatchingProposals.tsapps/server/src/schemas/priceMatches.tsapps/server/src/lib/priceMatching.test.tsapps/server/src/schemas/priceMatches.test.ts
Then check:
- extraction conservatism: prefer
nullor[]over guessing - decisive identity fields: producer, distillery, expression, series, edition, age, cask flags, ABV, and years
- candidate generation before web search
- action set boundaries:
match_existing,correction,create_new,no_match - confidence normalization and automation thresholds
- server-side sanitization of ids and proposed entities
- non-whisky rejection and human-review boundaries
Eval Minimum
Require:
- confusion-style breakdown by action or class
- hard-slice examples for decisive error modes
- trace or tool-call review
- cost, latency, and tool-usage metrics
- before vs after comparison for proposed changes