description: ADK agent design — callbacks, state, composition, Gemini config, graceful degradation paths: ["src/supply_chain_triage/modules/*/agents/**"]
Agent rules
Every agent is a subpackage: agent.py, prompts/*.md, schemas.py, tools.py. Co-located, not central. prompts/ is a folder of markdown files (not a single prompt.py) — chosen for long multi-section prompts and diff readability.
1. Callback placement
Five ADK hooks, each with distinct return semantics:
| Hook | None returned | Object returned |
|---|---|---|
before_agent_callback | proceed | types.Content — skip agent, use as final output |
before_model_callback | proceed | LlmResponse — skip LLM, use as response |
after_model_callback | use LLM output | LlmResponse — replace LLM output |
before_tool_callback | proceed | dict — skip tool, use as tool result |
after_tool_callback | use tool output | dict — replace tool output |
Canonical uses:
before_agent— load exception context from Firestore intostate, request-scoped setup, audit log entry.before_model— input guardrails, PII redaction, prompt validation, cached-response short-circuit.after_model— format enforcement (output_schemavalidation recovery), strip hallucinated fields, add disclaimers.before_tool— argument validation, authz, cost caps, mocked responses in tests.after_tool— normalize results, mask secrets, translate error schemas.
Never put business logic in callbacks (belongs in tools). Never put retry loops in after_model (use LoopAgent). Never Firestore-write in before_model (latency on critical path).
2. State namespacing
session.state is a string-keyed, JSON-serializable dict. Prefixes scope lifetime:
| Prefix | Scope |
|---|---|
| none | session-scoped |
user: | per user across sessions (same app_name) |
app: | global to the app |
temp: | this invocation only, never persisted |
Module-scoped keys for this project: triage:exception_id, triage:classification, triage:impact, triage:resolution. When port_intel/ lands under Meta-Coordinator, use port_intel:* — the prefix prevents collisions.
3. Never mutate session.state directly
Inside tools and callbacks, mutate via the context:
tool_context.state["triage:severity"] = "HIGH"
Not:
session.state["triage:severity"] = "HIGH" # WRONG — bypasses event tracking and persistence
ADK captures context mutations as EventActions.state_delta and writes atomically through SessionService.
4. Cross-agent data passing
Standard pattern: output_key= on the upstream agent auto-writes the final output to state[output_key]. Downstream agents reference {output_key} in their instruction template.
classifier = LlmAgent(name="classifier", ..., output_key="triage:classification")
impact = LlmAgent(
name="impact",
instruction="Given {triage:classification}, assess business impact...",
output_key="triage:impact",
)
pipeline = SequentialAgent(sub_agents=[classifier, impact])
Keep large domain objects in Firestore keyed by ID; store only the ID + small derived fields in state.
5. Structured output + tools mutual exclusion on Gemini 2.5 Flash
output_schema on an LlmAgent forbids tools or sub-agent transfer. Gemini 3.0 lifts this, but Flash does not. Canonical workaround (two-agent pattern):
fetcher = LlmAgent(
name="fetcher",
tools=[lookup_exception],
output_key="raw_exception",
)
formatter = LlmAgent(
name="formatter",
instruction="Format {raw_exception} as JSON matching the schema.",
output_schema=ClassificationOutput,
output_key="triage:classification",
)
classifier = SequentialAgent(sub_agents=[fetcher, formatter])
Recovery when structured output breaks (long prompts, deeply nested schemas, union types): validate in after_model_callback with try: Model.model_validate_json(...); on failure, return a corrective LlmResponse or escalate via LoopAgent.
Keep Tier 1 schemas flat — primitives, short enums, no deep nesting, no untagged unions.
6. Agent composition
| Type | When to use |
|---|---|
SequentialAgent | Deterministic pipeline (Classifier → Impact) |
ParallelAgent | Fan-out independent work (Impact + Route-Optimization on same exception in Tier 3) |
LoopAgent | Until convergence (Tier 2 Generator-Judge, max_iterations + judge escalates escalation_action to exit) |
LlmAgent with sub_agents=[...] | LLM-decided routing (Coordinator) — picks child based on each child's description= |
7. Terse-coordinator rule
Coordinator instruction stays under ~20 lines. Delegation logic goes in each child's description= field, which is what the Coordinator LLM actually sees when routing.
Bad: 200-line coordinator prompt that restates every child's behavior.
Good: 10-line router + rich per-child description= strings.
8. Thinking-budget defaults per role (Gemini 2.5 Flash)
from google.genai.types import GenerateContentConfig, ThinkingConfig
# Classifier / Impact — structured, fast
GenerateContentConfig(thinking_config=ThinkingConfig(thinking_budget=1024))
# Resolution Generator (Tier 2) — creative, longer reasoning
GenerateContentConfig(thinking_config=ThinkingConfig(thinking_budget=4096))
# Judge (Tier 2) — fast pass/fail
GenerateContentConfig(thinking_config=ThinkingConfig(thinking_budget=0))
# Comms drafter (Tier 3)
GenerateContentConfig(thinking_config=ThinkingConfig(thinking_budget=1024))
9. Safety settings
Default Gemini thresholds block logistics terms like "strike", "hazard cargo". For internal supply-chain content, loosen to BLOCK_ONLY_HIGH:
from google.genai.types import SafetySetting, HarmCategory, HarmBlockThreshold
safety_settings = [
SafetySetting(
category=HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
threshold=HarmBlockThreshold.BLOCK_ONLY_HIGH,
),
# ... same pattern for other categories
]
10. Streaming
Stream tokens only from the final agent. Never stream intermediate SequentialAgent steps — it leaks raw JSON fragments to users.
FastAPI pattern: wrap Runner.run_async as an async generator, emit SSE (text/event-stream), filter on event.is_final_response() + event.partial.
11. Never hand-write A2A
When an A2A surface is needed (Tier 3): uvx agent-starter-pack create ... --agent adk_a2a. Lift scaffolded files into runners/. Artifacts that are never hand-written:
A2aAgentExecutorAgentCardBuilderagent.jsonA2AFastAPIApplicationmount- Agent Engine CI/CD glue
12. Graceful degradation
If a sub-agent fails, the Coordinator must still return whatever it has. Concrete rule for triage:
If
{triage:impact}is missing in state, the Coordinator returns classification only withimpact_available=false. Never 500.
Model this via Coordinator instruction:
"If
{triage:impact}is not present, report classification only and note the impact assessment was unavailable."
13. No direct Firestore/Firebase imports
agent.py imports from google.adk.*. It does not import firebase_admin or google.cloud.firestore. All data access goes through tools. Enforced by ruff TID251 — see .claude/rules/imports.md.