name: code-test description: Enforce test-driven development (RED-GREEN-REFACTOR). Use when implementing features, fixing bugs, or changing behavior - write failing test first, then minimal code to pass.
Test-Driven Development
Write the test first. Watch it fail. Write minimal code to pass.
Core principle: If you didn't watch the test fail, you don't know if it tests the right thing.
When to Use
Always use for:
- New features
- Bug fixes
- Behavior changes
- Refactoring
Exceptions (confirm with user):
- Throwaway prototypes
- Generated code
- Configuration files
Workflow Integration:
- Multiple independent tasks from a plan? → Use
task-dispatchskill (it enforces TDD per task with quality gates) - Single implementation task? → Use this skill directly for TDD workflow
The Iron Law
NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST
Wrote code before the test? Delete it. Start over. No exceptions.
Red-Green-Refactor Cycle
1. RED - Write Failing Test
Write one minimal test showing what should happen.
Requirements:
- One behavior per test
- Clear name describing behavior
- Real code (avoid mocks unless unavoidable)
2. Verify RED - Watch It Fail
MANDATORY. Never skip.
Run the test. Confirm:
- Test fails (not errors from typos)
- Failure message matches expectation
- Fails because feature is missing
Test passes immediately? You're testing existing behavior. Fix the test.
3. GREEN - Minimal Code
Write the simplest code to pass the test.
DO: Just enough to pass, simple implementation DON'T: Add features, refactor other code, add configurability
4. Verify GREEN - Watch It Pass
MANDATORY.
Run the test. Confirm:
- Test passes
- Other tests still pass
- No errors or warnings
5. REFACTOR - Clean Up
Only after green:
- Remove duplication
- Improve names
- Extract helpers
Keep tests green. Don't add behavior.
6. Repeat
Next failing test for next behavior.
Common Rationalizations
| Excuse | Reality |
|---|---|
| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
| "I'll test after" | Tests passing immediately prove nothing. |
| "Already manually tested" | Ad-hoc is not systematic. No record, can't re-run. |
| "Deleting X hours is wasteful" | Sunk cost fallacy. Unverified code is debt. |
| "Need to explore first" | Fine. Throw away exploration, then TDD. |
| "Test hard to write" | Hard to test = hard to use. Simplify design. |
Red Flags - Stop and Start Over
- Code written before test
- Test passes immediately
- Can't explain why test failed
- "Just this once" rationalization
- Keeping code "as reference"
All of these mean: Delete code. Start with TDD.
Verification Checklist
Before marking work complete:
- Every new function has a test
- Watched each test fail before implementing
- Each test failed for expected reason
- Wrote minimal code to pass each test
- All tests pass
- No errors or warnings in output
TDD Evidence Format (For Subagent Verification)
When implementing as a subagent, you MUST output this evidence block:
tdd_evidence:
tests_written:
- name: "test_feature_x"
file: "tests/test_x.py"
red_output: "FAILED - [actual failure message]"
green_output: "PASSED - 1 passed in 0.05s"
implementation_files:
- path: "src/feature.py"
all_tests_pass: true
test_command: "pytest tests/test_x.py -v"
final_output: "[full test output]"
This is REQUIRED for SubagentStop hook verification.
Integration
Use with:
code-debug- Write failing test to reproduce bug before fixingtask-dispatch- Subagents follow TDD for each taskcompletion-verify- Run tests before claiming completion