name: reproduce-papers description: Plan and execute reproducible computer science paper reproductions. Use when Codex needs to extract implementation-critical details from a paper, inventory required artifacts and environments, build a runnable draft reproduction, track missing details and deviations, compare reproduced results with the paper, or package the work into reusable experiment notes.
Reproduce Papers
Overview
Use this skill to turn a paper into a traceable reproduction workflow. Move from paper reading to artifact inventory, runnable draft implementation, gap tracking, and result comparison without waiting for every missing detail to be resolved.
Workflow Router
- If the user needs reproduction-critical details from the paper, read
references/paper-reading.md. - If the user needs to inventory code, data, weights, binaries, dependencies, hardware, or benchmark assets, read
references/artifact-and-environment.md. - If the user needs a runnable first-pass reproduction before exact fidelity, read
references/draft-first-reproduction.md. - If the user is blocked by missing details, approximations, or deviations, read
references/gap-tracking.md. - If the user needs to compare reproduced results with the paper, read
references/result-comparison.md.
Use multiple references only when the request spans multiple stages.
Core Workflow
- Identify the current stage: paper reading, artifact inventory, draft reproduction, gap resolution, or result comparison.
- Extract reproduction-critical details first: problem definition, inputs and outputs, system boundaries, core method, training or runtime conditions, evaluation protocol, and reported metrics.
- Build the smallest runnable draft once the minimum end-to-end path is clear. Do not wait for every hyperparameter or systems detail to be known.
- Record every unknown, approximation, deviation, and evidence source in
gap_tracker.md. - Compare reproduced results against the paper's main tables, figures, or claims. Explain whether the central conclusion reproduced, not only whether every number matched exactly.
Working Rules
- Treat the paper as the primary source of truth. Use the appendix, supplementary material, cited work, benchmark documentation, and official repositories only when the paper leaves a critical gap.
- Do not wait for perfect fidelity before building a baseline.
- Keep plans framework-neutral unless the paper or repository forces a specific stack.
- Prefer explicit assumptions over silent guessing.
- Preserve traceability. When you infer or substitute something, state why it is reasonable and where it should be verified.
- When the paper is not training-centered, reinterpret the runnable draft as the smallest executable evaluation path, benchmark harness, detector pass, or system prototype that still exercises the paper's core claim.
Standard Workspace
Initialize a new case with:
python scripts/init_repro_case.py --title "Paper Title" --slug paper-slug --out ./cases
This creates <out>/<slug>/ with:
paper_brief.mdartifact_inventory.mdimplementation_plan.mdgap_tracker.mdexperiment_log.csvvalidation_report.md
Template files live under assets/templates/. Use them as scaffolding for the reproduction effort rather than as final polished documentation.
Expected Outputs
- A concise paper brief with the reproduction-critical facts.
- An artifact and environment inventory.
- A runnable draft reproduction plan.
- A gap tracker with assumptions, impacts, and next actions.
- A validation report that compares reproduced findings with the paper's main claims.