name: experiment-cycle description: Run or design an experiment with fixed/varying variables, metrics, and recorded results. Use for ablations, evals, or hyperparameter runs.
Experiment cycle
When to use
- User wants to run an experiment (ablation, eval, hyperparameter, comparison).
- Task involves "vary X, measure Y, record Z".
Input/Output
- Input: What varies, what is fixed, metrics, where to record (path or format).
- Output: Command(s) run, where results were written, facts (numbers), inference (optional), recommendations.
Steps (SOP)
- Define: Fixed variables, varying variables, metrics, acceptance (e.g. run completes, metric threshold).
- Run: Execute with project scripts if any (e.g.
run_ablation.py). Use stated config. - Record: Write results to agreed path (file, table). Do not leave only in chat.
- Conclude: Facts (observed values) | Inference ("likely because...") | Recommendations ("next: ...").
Acceptance criteria
- Results written to a file or artifact. Facts separated from inference. No "improved" without evidence or baseline.
Common failures
- No record: Always write results somewhere reproducible.
- Mixing fact and inference: Label clearly. Do not state "better" without numbers or baseline.
- Under-spec: If config is vague, state assumptions (e.g. seed, single run) and note replication needs.
When not to use
- One-off script with no metrics (use plan-then-implement).
- Bug fix (use debug-regression).
- Feature implementation (use implement-feature-with-gates).