name: academic-paper-verify description: > Thoroughly verify all code, tables, figures, modeling decisions, and quantitative claims in an academic paper against its source R scripts and output files. Use this skill whenever you need to audit, replicate, or verify an academic research paper - including cross-checking LaTeX tables against R output, validating econometric modeling choices, ensuring sample sizes are consistent, building a verification manifest, and running automated replication tests. Trigger this skill for any mention of: paper verification, replication check, table audit, code-paper consistency, reproducing results, verifying estimates, checking coefficients, or any variant of "does the paper match the code."
Academic Paper Verification
A systematic skill for verifying the integrity and replicability of an academic research paper. This covers everything from individual coefficient checks to full end-to-end replication.
Overview
Verification proceeds in six phases. Each phase produces structured output. Do not skip phases - earlier phases feed into later ones.
Phase 1: Discovery -> inventory of all project files, scripts, outputs, paper
Phase 2: Table Audit -> cross-check every number in every table
Phase 3: Inline Claims -> verify quantitative claims in paper body text
Phase 4: Code Review -> audit R scripts for correctness, modeling decisions, data pipeline
Phase 5: Manifest Build -> create verification_manifest.json linking claims to code
Phase 6: Replication -> write and run tests/verify_replication.R, fix failures
Before You Start
- Identify the project root directory. Look for
.Rprojfiles,README, or ask the user. - Read
references/phase-details.mdfor the full procedure for each phase. - Read
references/common-pitfalls.mdfor known failure modes to watch for.
Phase 1: Discovery
Scan the entire project and build an inventory. You need to know what you're working with before you can verify anything.
Find and catalog:
- All
.Rand.Rmdscripts (note execution order if a master script exists) - All output files:
.csv,.rds,.tex,.txt,.login results/, output/, tables/, etc. - The LaTeX paper file(s):
.texin the root or paper/ or draft/ directory - Any data files:
.csv,.dta,.rds,.xlsxin data/ or similar - Any configuration or parameter files
Produce: A file inventory printed to the console, organized by type, with notes on what each script appears to do (based on filename and a quick scan of its first ~30 lines).
Key questions to answer in this phase:
- Is there a master script that runs everything in order?
- Where do intermediate outputs land?
- Which scripts produce which tables/figures?
- Are there any scripts that appear unused or orphaned?
Phase 2: Table Audit
This is the most critical phase. Read references/phase-details.md Section 2 for the full
procedure.
For every table in the paper:
-
Locate the table in the LaTeX source. Extract every number: coefficients, standard errors, t-statistics, p-values, confidence intervals, sample sizes (N), R-squared, F-statistics, means, medians, percentages - everything.
-
Locate the corresponding R output file that produced this table. This might be a
.texfile generated bystargazer,modelsummary,xtable,kableExtra,huxtable, or similar. It could also be a.csv,.rds, or text log. -
Cross-check every single number. Compare to the R output with appropriate tolerance:
- Coefficients and standard errors: match to the number of decimal places shown
- Sample sizes: must match exactly
- R-squared and similar: match to displayed precision
- Percentages: verify the arithmetic (numerator/denominator)
-
Check for rounding consistency - if a coefficient is 0.0347 in the R output and 0.035 in the paper, that is acceptable rounding. If it is 0.038, that is a discrepancy.
-
Verify that column headers, variable names, and panel labels in the paper match the specification in the code.
-
Check that the number of observations (N) is consistent across all tables that use the same sample. If Table 1 reports N=4,521 and Table 3 uses the same sample but reports N=4,519, that needs explanation.
Produce: A table-by-table verification report. For each table:
- Table number and title
- Source R script and output file
- Number of values checked
- List of any discrepancies with exact locations (paper line number, output file line number)
- PASS/FAIL status
Phase 3: Inline Claims Audit
Read the paper body text (not just tables) and find every quantitative claim. These include:
- "We find a 3.2 percentage point increase..."
- "The effect is significant at the 5% level..."
- "Our sample includes 12,450 observations..."
- "Column 3 of Table 2 shows that..."
- "The coefficient on X is negative and significant..."
- Footnotes with numbers or statistical claims
- Abstract claims about magnitudes and significance
For each claim, trace it back to a specific table cell, figure, or R output. Flag any claim that cannot be traced or that contradicts the evidence.
Produce: A claims checklist with claim text, source location in paper, evidence source, and VERIFIED/UNVERIFIED/DISCREPANCY status.
Phase 4: Code Review
Read every R script in the project, in execution order. This is not just a syntax check -
you are auditing the analytical pipeline. Read references/phase-details.md Section 4 and
references/common-pitfalls.md for what to look for.
Data Pipeline Verification:
- At every
merge,join,filter,subset, ormutatestep, check: (a) How many observations before vs. after the transformation? (b) Do all column names needed downstream still exist? (c) Are key summary statistics (mean, min, max, N) reasonable after the step? - Flag any joins that could silently drop or duplicate observations
- Flag any filters that might be too aggressive or too permissive
- Check for proper handling of missing values (NA) - are they dropped, imputed, or ignored?
- Verify that panel/time-series data is properly balanced or that imbalance is handled
Modeling Decisions:
- Are the regression specifications consistent with what the paper describes? (e.g., if the paper says "we control for year fixed effects", is that in the code?)
- Are standard errors clustered as described? (robust, clustered at the right level, etc.)
- Are instrumental variables correctly specified? (first stage, exclusion restriction checks)
- Is the sample restriction for each regression clearly defined and consistent with the paper?
- Are interaction terms, polynomials, or transformations correctly implemented?
- Do subsample analyses actually use the right subsamples?
Robustness and Red Flags:
- Are there hardcoded values that should be computed? (e.g.,
filter(year > 2005)when the paper says "post-treatment period" without defining the cutoff) - Are there commented-out lines that suggest alternative specifications were tried?
- Is there any evidence of p-hacking patterns (many specifications tried, only one reported)?
- Are random seeds set for any stochastic procedures?
- Are there warnings or errors being suppressed?
Produce: A script-by-script review with:
- Script name and purpose
- Data pipeline issues (with line numbers)
- Modeling decision flags (with line numbers)
- Red flags (with line numbers)
- Overall assessment: CLEAN / MINOR ISSUES / MAJOR ISSUES
Phase 5: Build Verification Manifest
Create verification_manifest.json that maps every quantitative claim in the paper to
the code that produces it.
Structure:
{
"paper_file": "paper/main.tex",
"generated_at": "2026-02-08T12:00:00Z",
"claims": [
{
"id": "T1_R2_C3",
"type": "coefficient",
"paper_location": {"file": "paper/main.tex", "line": 234, "context": "Table 1, Row 2, Col 3"},
"paper_value": "0.035",
"source_script": "code/02_main_regression.R",
"source_line": 87,
"output_file": "results/table1.tex",
"output_location": {"line": 15, "context": "second coefficient in column 3"},
"expected_value": "0.0347",
"tolerance": 0.001,
"status": "PASS",
"notes": "Acceptable rounding from 0.0347 to 0.035"
},
{
"id": "BODY_P12_S3",
"type": "inline_claim",
"paper_location": {"file": "paper/main.tex", "line": 412, "context": "paragraph 12, sentence 3"},
"paper_value": "3.2 percentage points",
"source_script": "code/02_main_regression.R",
"source_line": 87,
"output_file": "results/table1.tex",
"output_location": {"line": 15},
"expected_value": "0.032",
"tolerance": 0.001,
"status": "PASS",
"notes": "Coefficient 0.0323 reported as 3.2pp"
}
],
"summary": {
"total_claims": 142,
"passed": 139,
"failed": 2,
"unverified": 1
}
}
Every coefficient, standard error, sample size, p-value, summary statistic, and verbal claim should appear in this manifest. Be exhaustive.
Phase 6: Replication Test Suite
Write tests/verify_replication.R that programmatically reruns the analysis and checks
results against the manifest.
Read references/replication-script-template.md for the template and structure.
The test script must:
- Source or rerun each analysis script in the correct order
- Extract the relevant outputs (coefficients, SEs, N, R-squared, etc.)
- Compare against the values in
verification_manifest.json - Use appropriate tolerance for floating-point comparisons
- Report PASS/FAIL for each claim with clear diagnostics on failure
- Handle dependencies gracefully (if a data file is missing, report it, do not crash)
After writing the test script:
- Run it
- For any failures, diagnose the root cause
- If the failure is due to a code bug (not a paper-code mismatch), fix the upstream script and document what you fixed
- Rerun until all tests pass or all remaining failures are genuine paper-code discrepancies
- Produce a final summary
Produce:
tests/verify_replication.R- the test scripttests/replication_results.json- structured test resultstests/replication_summary.md- human-readable summary of what passed, what failed, what was fixed, and what remains unresolved
Output Format
At the end of the full verification, produce a consolidated report. Use this structure:
# Paper Verification Report
## Executive Summary
- Total quantitative claims checked: X
- Passed: Y
- Failed: Z
- Unverified: W
- Code issues found: N (M major, K minor)
## Table-by-Table Results
[from Phase 2]
## Inline Claims Results
[from Phase 3]
## Code Review Findings
[from Phase 4]
## Replication Test Results
[from Phase 6]
## Recommendations
[prioritized list of issues to address]
Important Notes
- Never silently skip a number. If you cannot verify a value, mark it UNVERIFIED with an explanation.
- When in doubt, flag it. False positives are better than missed discrepancies.
- Pay special attention to N (sample sizes) - these are the most common source of inconsistencies across tables and text.
- If the project uses R packages that produce formatted output (stargazer, modelsummary, etc.), check the raw model objects too, not just the formatted output.
- If you encounter Stata
.dofiles or Python scripts mixed in, verify those too using the same principles. - The user may want you to run this on a subset (e.g., "just check Table 3"). Adapt accordingly but note what was not checked.