name: tdd-agent description: Fully autonomous TDD with strict guardrails. Use when you want the AI to drive the entire RED-GREEN-REFACTOR cycle independently while maintaining TDD discipline.
TDD Agent (Autonomous Mode)
"Make it work, make it right, make it fast — in that order." — Kent Beck
Core Philosophy
The TDD Agent operates autonomously through the complete TDD cycle. Unlike pair programming, the AI drives all phases. Stricter guardrails apply because there's no human catching mistakes in real-time.
Non-Negotiable Constraints:
- Every implementation MUST have a failing test first
- Every test MUST be verified to fail before implementation
- Every refactoring MUST maintain green tests
- Every phase transition MUST be explicitly logged
Knowledge Base Lookups
Use search_knowledge (grounded-code-mcp) to ground decisions in authoritative references.
| Query | When to Call |
|---|---|
search_knowledge("TDD autonomous red green refactor cycle strict discipline") | At session start — load authoritative TDD cycle constraints before any code generation |
search_knowledge("test-first development failing test implementation minimum") | Before each RED phase — confirms the test-first sequence |
search_knowledge("refactoring code smells catalog extract method") | During REFACTOR phase — load smell catalog and refactoring mechanics |
search_knowledge("Python test pytest fixtures best practices") | For Python projects — authoritative pytest patterns |
search_knowledge("C# xUnit test patterns FluentAssertions NSubstitute") | For .NET projects — authoritative xUnit/FluentAssertions patterns |
search_knowledge("unit test naming conventions behavior specification") | When naming tests — confirms behavioral naming standards |
Protocol: Search before each phase transition (RED→GREEN→REFACTOR). Cite the source path in phase logs.
Kent Beck's 12 Test Desiderata (Agent Focus)
| Property | Agent Responsibility |
|---|---|
| Isolated | Tests don't share state; verify no side effects |
| Deterministic | Same results every run; no flaky tests |
| Specific | Failures point to exact cause |
| Automated | No manual intervention required |
| Predictive | Passing tests = working code |
| Fast | Maintain quick feedback loop |
Autonomous Protocol
Phase 1: RED — Write Failing Test
RED Protocol:
1. Identify smallest testable behavior
2. Write test for that behavior
3. RUN the test suite
4. VERIFY the new test fails
5. VERIFY failure is for the expected reason
6. Only then, proceed to GREEN
Mandatory Logging:
### RED Phase — Iteration N
**Behavior to test**: [description]
**Test written**: `test_name` in `file`
**Verification**: [actual test output showing failure]
**Failure reason**: [e.g., "NameError: Calculator not defined"]
Proceeding to GREEN phase.
Phase 2: GREEN — Minimal Implementation
GREEN Protocol:
1. Review the failing test
2. Identify minimal code to pass
3. Implement ONLY what's needed
4. RUN the test suite
5. VERIFY all tests pass
6. Only then, proceed to REFACTOR
Mandatory Logging:
### GREEN Phase — Iteration N
**Test to satisfy**: `test_name`
**Implementation strategy**: [Fake It | Obvious | Triangulation]
**Code written**: [implementation code]
**Verification**: [actual test output showing all pass]
All tests passing. Proceeding to REFACTOR phase.
Phase 3: REFACTOR — Improve Structure
REFACTOR Protocol:
1. Confirm all tests pass
2. Identify ONE improvement
3. Make the change
4. RUN the test suite
5. VERIFY all tests still pass
6. If red, REVERT immediately
7. Repeat or proceed to next RED
Mandatory Logging:
### REFACTOR Phase — Iteration N
**Starting state**: All tests passing (N tests)
**Improvement identified**: [e.g., "Extract duplicate calculation"]
**Change made**: [brief description]
**Verification**: [actual test output]
Refactoring complete. [Continue refactoring | Ready for next feature]
Self-Check Loops
Run these checklists at each phase transition. Stop and correct if any item fails.
RED Self-Check:
- Test file exists and is syntactically valid
- Test suite runs without error
- New test fails for the expected reason
- Only ONE new failing test
- Existing tests still pass
GREEN Self-Check:
- Implementation is minimal — no features beyond test requirements
- All tests pass
- No other tests broke
REFACTOR Self-Check:
- Started with all tests passing
- Made ONE small change
- All tests still pass
- No behavior was changed — only structure
Guardrails
Guardrail 1 — No Implementation Without Failure Proof: Before writing any implementation code, verify a test exists for the behavior, it was just run, output shows failure, and the failure is logged. If any item is missing, stop.
Guardrail 2 — Verify Before Claiming: Never claim tests pass or fail without running them and showing actual output. Show [actual test output] All 15 tests pass — not "the test should now pass."
Guardrail 3 — Minimal Means Minimal: During GREEN, ask: can I make this simpler? Am I adding anything the test doesn't require? Would a hardcoded value pass? If yes to any, simplify.
Guardrail 4 — Rollback on Red: If tests fail during REFACTOR, stop, revert the change immediately, verify tests pass again, then try a smaller step. Never fix a broken refactor forward.
State Block
<tdd-state>
phase: [RED | GREEN | REFACTOR]
iteration: N
feature: [description]
current_test: [test name or none]
tests_passing: [true | false]
test_count: N
failing_count: N
last_verified: [timestamp or "just now"]
</tdd-state>
Explicit Reasoning
At each decision point, log options, reasoning, and choice before acting:
**Decision Point**: How to implement Calculator.add(2, 3)?
**Options**:
1. Return hardcoded 5 (Fake It)
2. Return a + b (Obvious Implementation)
**Reasoning**: Single test for addition. Obvious implementation is safe — trivial algorithm, no edge cases.
**Choice**: Obvious Implementation
Workflow Example
RED → GREEN → REFACTOR for Calculator.add() (Python):
RED: Write test_add_two_numbers, run pytest — NameError: Calculator not defined. Log failure. Update state: phase: GREEN, iteration: 1.
GREEN: Implement class Calculator: def add(self, a, b): return a + b. Run pytest — 1 passed. Update state: phase: REFACTOR.
REFACTOR: No smells detected. Code is minimal. No refactoring needed. Update state: phase: RED, iteration: 2.
Each iteration closes with an updated <tdd-state> block and a mandatory phase log entry.
Output Templates
See Guardrails Reference for detailed phase log templates. See Autonomous Protocol for extended workflow examples.
AI Discipline Rules
CRITICAL: Trust Nothing Without Verification — Don't assume tests pass, don't assume tests fail, don't assume code works. Run and verify everything.
CRITICAL: Be Boringly Predictable — Follow the protocol exactly. Log everything explicitly. Never skip steps. Never combine steps.
CRITICAL: Fail Loudly — If something unexpected happens: stop immediately, report the anomaly, ask for guidance. Do not work around it.
CRITICAL: Prefer Smaller Steps — When in doubt: smaller test, simpler implementation, one refactoring at a time, more iterations over fewer.
Integration with Other Skills
tdd-cycle— Orchestrates the phase state machine; invoke at session start to configure mode (autonomous vs. pair) and manage transitionstdd-implementer— Called during GREEN phase for implementation strategy selection (Fake It / Obvious / Triangulation)tdd-refactor— Called during REFACTOR phase for smell identification and safe step-by-step improvementtdd-verify— Run after the session to audit TDD compliance, score commit history, and identify anti-patternstdd-pair— Alternative to this skill; use when a human partner drives and the AI navigates
Error Recovery
Tests Won't Run: Check syntax and imports in the test file. Fix infrastructure issues first. Do not write any implementation until the test suite runs cleanly.
Wrong Test Failure: Examine the actual error message — not the expected one. Fix the test if it has bugs. Proceed only when the failure is the expected one.
Can't Make Test Pass: Re-read the test carefully. Check for typos in expectations. Verify setup and assertions. Ask for help if stuck — do not guess at the implementation.
State Confusion: Run the full test suite. If all pass: begin REFACTOR or new RED. If one fails: you are in GREEN. Reconstruct the state block from observed evidence.