name: academic-paper-verify description: > Thoroughly verify all code, tables, figures, modeling decisions, and quantitative claims in an academic paper against its source R scripts and output files. Use this skill whenever you need to audit, replicate, or verify an academic research paper - including cross-checking LaTeX tables against R output, validating econometric modeling choices, ensuring sample sizes are consistent, building a verification manifest, and running automated replication tests. Trigger this skill for any mention of: paper verification, replication check, table audit, code-paper consistency, reproducing results, verifying estimates, checking coefficients, or any variant of "does the paper match the code."

Academic Paper Verification

A systematic skill for verifying the integrity and replicability of an academic research paper. This covers everything from individual coefficient checks to full end-to-end replication.

Overview

Verification proceeds in six phases. Each phase produces structured output. Do not skip phases - earlier phases feed into later ones.

Phase 1: Discovery       -> inventory of all project files, scripts, outputs, paper
Phase 2: Table Audit     -> cross-check every number in every table
Phase 3: Inline Claims   -> verify quantitative claims in paper body text
Phase 4: Code Review     -> audit R scripts for correctness, modeling decisions, data pipeline
Phase 5: Manifest Build  -> create verification_manifest.json linking claims to code
Phase 6: Replication     -> write and run tests/verify_replication.R, fix failures

Before You Start

Identify the project root directory. Look for .Rproj files, README, or ask the user.
Read references/phase-details.md for the full procedure for each phase.
Read references/common-pitfalls.md for known failure modes to watch for.

Phase 1: Discovery

Scan the entire project and build an inventory. You need to know what you're working with before you can verify anything.

Find and catalog:

All .R and .Rmd scripts (note execution order if a master script exists)
All output files: .csv, .rds, .tex, .txt, .log in results/, output/, tables/, etc.
The LaTeX paper file(s): .tex in the root or paper/ or draft/ directory
Any data files: .csv, .dta, .rds, .xlsx in data/ or similar
Any configuration or parameter files

Produce: A file inventory printed to the console, organized by type, with notes on what each script appears to do (based on filename and a quick scan of its first ~30 lines).

Key questions to answer in this phase:

Is there a master script that runs everything in order?
Where do intermediate outputs land?
Which scripts produce which tables/figures?
Are there any scripts that appear unused or orphaned?

Phase 2: Table Audit

This is the most critical phase. Read references/phase-details.md Section 2 for the full procedure.

For every table in the paper:

Locate the table in the LaTeX source. Extract every number: coefficients, standard errors, t-statistics, p-values, confidence intervals, sample sizes (N), R-squared, F-statistics, means, medians, percentages - everything.
Locate the corresponding R output file that produced this table. This might be a .tex file generated by stargazer, modelsummary, xtable, kableExtra, huxtable, or similar. It could also be a .csv, .rds, or text log.
Cross-check every single number. Compare to the R output with appropriate tolerance:
- Coefficients and standard errors: match to the number of decimal places shown
- Sample sizes: must match exactly
- R-squared and similar: match to displayed precision
- Percentages: verify the arithmetic (numerator/denominator)
Check for rounding consistency - if a coefficient is 0.0347 in the R output and 0.035 in the paper, that is acceptable rounding. If it is 0.038, that is a discrepancy.
Verify that column headers, variable names, and panel labels in the paper match the specification in the code.
Check that the number of observations (N) is consistent across all tables that use the same sample. If Table 1 reports N=4,521 and Table 3 uses the same sample but reports N=4,519, that needs explanation.

Produce: A table-by-table verification report. For each table:

Table number and title
Source R script and output file
Number of values checked
List of any discrepancies with exact locations (paper line number, output file line number)
PASS/FAIL status

Phase 3: Inline Claims Audit

Read the paper body text (not just tables) and find every quantitative claim. These include:

"We find a 3.2 percentage point increase..."
"The effect is significant at the 5% level..."
"Our sample includes 12,450 observations..."
"Column 3 of Table 2 shows that..."
"The coefficient on X is negative and significant..."
Footnotes with numbers or statistical claims
Abstract claims about magnitudes and significance

For each claim, trace it back to a specific table cell, figure, or R output. Flag any claim that cannot be traced or that contradicts the evidence.

Produce: A claims checklist with claim text, source location in paper, evidence source, and VERIFIED/UNVERIFIED/DISCREPANCY status.

Phase 4: Code Review

Read every R script in the project, in execution order. This is not just a syntax check - you are auditing the analytical pipeline. Read references/phase-details.md Section 4 and references/common-pitfalls.md for what to look for.

Data Pipeline Verification:

At every merge, join, filter, subset, or mutate step, check: (a) How many observations before vs. after the transformation? (b) Do all column names needed downstream still exist? (c) Are key summary statistics (mean, min, max, N) reasonable after the step?
Flag any joins that could silently drop or duplicate observations
Flag any filters that might be too aggressive or too permissive
Check for proper handling of missing values (NA) - are they dropped, imputed, or ignored?
Verify that panel/time-series data is properly balanced or that imbalance is handled

Modeling Decisions:

Are the regression specifications consistent with what the paper describes? (e.g., if the paper says "we control for year fixed effects", is that in the code?)
Are standard errors clustered as described? (robust, clustered at the right level, etc.)
Are instrumental variables correctly specified? (first stage, exclusion restriction checks)
Is the sample restriction for each regression clearly defined and consistent with the paper?
Are interaction terms, polynomials, or transformations correctly implemented?
Do subsample analyses actually use the right subsamples?

Robustness and Red Flags:

Are there hardcoded values that should be computed? (e.g., filter(year > 2005) when the paper says "post-treatment period" without defining the cutoff)
Are there commented-out lines that suggest alternative specifications were tried?
Is there any evidence of p-hacking patterns (many specifications tried, only one reported)?
Are random seeds set for any stochastic procedures?
Are there warnings or errors being suppressed?

Produce: A script-by-script review with:

Script name and purpose
Data pipeline issues (with line numbers)
Modeling decision flags (with line numbers)
Red flags (with line numbers)
Overall assessment: CLEAN / MINOR ISSUES / MAJOR ISSUES

Phase 5: Build Verification Manifest

Create verification_manifest.json that maps every quantitative claim in the paper to the code that produces it.

Structure:

{
  "paper_file": "paper/main.tex",
  "generated_at": "2026-02-08T12:00:00Z",
  "claims": [
    {
      "id": "T1_R2_C3",
      "type": "coefficient",
      "paper_location": {"file": "paper/main.tex", "line": 234, "context": "Table 1, Row 2, Col 3"},
      "paper_value": "0.035",
      "source_script": "code/02_main_regression.R",
      "source_line": 87,
      "output_file": "results/table1.tex",
      "output_location": {"line": 15, "context": "second coefficient in column 3"},
      "expected_value": "0.0347",
      "tolerance": 0.001,
      "status": "PASS",
      "notes": "Acceptable rounding from 0.0347 to 0.035"
    },
    {
      "id": "BODY_P12_S3",
      "type": "inline_claim",
      "paper_location": {"file": "paper/main.tex", "line": 412, "context": "paragraph 12, sentence 3"},
      "paper_value": "3.2 percentage points",
      "source_script": "code/02_main_regression.R",
      "source_line": 87,
      "output_file": "results/table1.tex",
      "output_location": {"line": 15},
      "expected_value": "0.032",
      "tolerance": 0.001,
      "status": "PASS",
      "notes": "Coefficient 0.0323 reported as 3.2pp"
    }
  ],
  "summary": {
    "total_claims": 142,
    "passed": 139,
    "failed": 2,
    "unverified": 1
  }
}

Every coefficient, standard error, sample size, p-value, summary statistic, and verbal claim should appear in this manifest. Be exhaustive.

Phase 6: Replication Test Suite

Write tests/verify_replication.R that programmatically reruns the analysis and checks results against the manifest.

Read references/replication-script-template.md for the template and structure.

The test script must:

Source or rerun each analysis script in the correct order
Extract the relevant outputs (coefficients, SEs, N, R-squared, etc.)
Compare against the values in verification_manifest.json
Use appropriate tolerance for floating-point comparisons
Report PASS/FAIL for each claim with clear diagnostics on failure
Handle dependencies gracefully (if a data file is missing, report it, do not crash)

After writing the test script:

Run it
For any failures, diagnose the root cause
If the failure is due to a code bug (not a paper-code mismatch), fix the upstream script and document what you fixed
Rerun until all tests pass or all remaining failures are genuine paper-code discrepancies
Produce a final summary

Produce:

tests/verify_replication.R - the test script
tests/replication_results.json - structured test results
tests/replication_summary.md - human-readable summary of what passed, what failed, what was fixed, and what remains unresolved

Output Format

At the end of the full verification, produce a consolidated report. Use this structure:

# Paper Verification Report

## Executive Summary
- Total quantitative claims checked: X
- Passed: Y
- Failed: Z
- Unverified: W
- Code issues found: N (M major, K minor)

## Table-by-Table Results
[from Phase 2]

## Inline Claims Results
[from Phase 3]

## Code Review Findings
[from Phase 4]

## Replication Test Results
[from Phase 6]

## Recommendations
[prioritized list of issues to address]

Important Notes

Never silently skip a number. If you cannot verify a value, mark it UNVERIFIED with an explanation.
When in doubt, flag it. False positives are better than missed discrepancies.
Pay special attention to N (sample sizes) - these are the most common source of inconsistencies across tables and text.
If the project uses R packages that produce formatted output (stargazer, modelsummary, etc.), check the raw model objects too, not just the formatted output.
If you encounter Stata .do files or Python scripts mixed in, verify those too using the same principles.
The user may want you to run this on a subset (e.g., "just check Table 3"). Adapt accordingly but note what was not checked.

ナビゲーション

Skillsとは？

リンク

academic-paper-verify