SKILL.md — Paper2Protocol Skill Definition
Version: 1.2 Created: 2026-03-20 License: CC BY-NC 4.0
Overview
From published high-impact primary literature, reverse-engineer complete experimental validation plans — transforming scientific discoveries into executable research protocols.
Core Principle: Only use primary sources (PMC full-text, journal PDFs), never abstracts or second-hand reviews.
Input Requirements
✅ Accepted
- PMC full-text (NCBI PubMed Central, Open Access)
- Journal website PDFs (Nature/Science/Cell, peer-reviewed)
- DeepReader-generated full-text analysis documents
❌ Rejected
- Abstracts only
- News articles / media interpretations
- Review articles (as primary input)
- AI-generated summaries (not based on primary sources)
Input Formats
- PMC URL → Auto-fetch full text
- PDF file → Direct analysis
- Paper title → Search PMC for full text
Workflow (5 Stages)
Stage 1: Source Acquisition & Quality Assessment
- Validate input as primary source
- Fetch full text (PMC API / PDF parsing)
- Quality rating:
- Journal tier (CNS / sub-journal / field-top / other)
- Research type (basic / clinical / translational)
- Data completeness (supplementary materials, raw data links)
- Reproducibility (method detail, sample size)
Stage 2: Scientific Logic Deconstruction
Extract complete scientific logic:
- Core Scientific Question: What problem does this paper solve?
- Research Strategy: Hypothesis, models (in vivo/in vitro/in silico/clinical), key techniques
- Validation Chain:
Annotate purpose and expected outcome at each node.Hypothesis → Key Experiment 1 → Key Experiment 2 → ... → Conclusion - Innovation Analysis: Methodological, conceptual, and application innovations.
Stage 3: Executable Experimental Paths
3.1 Experiment Layering
- Must-do: Core experiments validating the hypothesis
- Should-do: Supporting experiments
- Nice-to-do: Mechanism deep-dives or scope extensions
3.2 Per-Experiment Details
| Field | Content |
|---|---|
| Experiment Name | Specific name |
| Purpose | Role in validation chain |
| Method | Detailed protocol (paper Methods + best practices) |
| Samples/Materials | Cell lines, animal models, clinical samples |
| Sample Size | Statistically required minimum |
| Key Reagents | Brand, catalog reference, concentration |
| Equipment | Required instruments + alternatives |
| Expected Results | Positive/negative controls, data type |
| Timeline | Per-experiment duration + replicates |
| Budget | Reagents + consumables + services |
| Risk Assessment | Failure causes + backup plans |
3.3 Bioinformatics Analysis (if applicable)
| Field | Content |
|---|---|
| Analysis Goal | Specific task |
| Data Source | Public databases (TCGA/GEO) or generated data |
| Tools | Recommended pipeline (R/Python/online) |
| Key Parameters | Standard settings |
| Expected Output | Figure types, statistics |
| Compute Resources | Local/server/cloud requirements |
3.4 Bioinformatics Code (REQUIRED when analysis involves bioinformatics)
When experiments involve bioinformatics, complete runnable code MUST be provided.
Requirements:
- Language: R (Bioconductor) or Python (R preferred)
- Completeness: End-to-end, data download to publication figures
- Comments: Key steps annotated in English
- Data Sources: Prioritize public databases (TCGA, GEO, Beat-AML)
- Standard Tools: ssGSEA/GSEA, DESeq2, CIBERSORTx/xCell, survival, ComplexHeatmap
- Statistical Rigor: Multiple testing correction (BH), power analysis
Coverage:
- Subtype Classification: ssGSEA + K-means/Hierarchical clustering
- Differential Expression: DESeq2/edgeR → volcano plot
- Survival Analysis: Kaplan-Meier + Cox regression + ROC (timeROC)
- Gene Enrichment: GSEA + ssGSEA + Hallmark/Immunologic gene sets
- Immune Microenvironment: CIBERSORTx/xCell deconvolution
- Heatmaps: ComplexHeatmap / pheatmap
- Prognostic Models: LASSO Cox + glmnet + Nomogram (rms)
- Flow Cytometry: FlowJo export → Python statistical analysis
- Panel Selection: LASSO + Random Forest intersection → minimal gene set
- Automation: Bash shell script to chain all analysis steps
3.5 Budget Summary
Phase 1 (Core Validation): $XX,XXX
- Reagents: $X,XXX
- Consumables: $X,XXX
- Services (sequencing): $XX,XXX
- Animals: $X,XXX
Phase 2 (Mechanism): $XX,XXX
...
Total: $XXX,XXX – $XXX,XXX
Stage 4: Extension Projects (2-3 proposals)
Each includes:
- Project Name
- Scientific Question
- Innovation vs original paper
- Feasibility: ⭐ rating (technical difficulty, resources, timeline)
- Expected Outcomes: Paper tier, patent potential, clinical value
- Risk Assessment: Bottlenecks and failure risks
Stage 5: Multi-Paper Synthesis (Accumulation Mode)
Triggered when ≥3 papers accumulate per topic:
- By Scientific Question: Group papers by shared research questions
- By Method: Rank techniques by frequency → prioritize platform setup
- Integrated Roadmap: Deduplicate protocols, consolidate budgets
- Research Timeline: 12-month plan based on synthesis
Output Format
Standard Structure
# 📋 [Paper Title] → Experimental Validation Plan
## 📄 Paper Information
## 🔬 Part 1: Validation Logic
## 🧪 Part 2: Executable Experimental Paths
## 💻 Part 3: Bioinformatics Code (if applicable)
## 🚀 Part 4: Extension Projects
## 📝 Execution Recommendations
Output Formats
- Markdown (default)
- PDF Report (HTML → browser print, all tables and code blocks)
- Any document platform (Feishu, Notion, etc.)
Storage & Indexing
literature-to-experiment/
├─ index.json
├─ by_project/
│ └─ [Project Name]/
│ └─ PMCxxxxxx_protocol.md
├─ by_topic/
│ └─ [Topic Name]/
└─ summaries/
└─ [Topic]_synthesis.md
Notes
- Pricing: Based on 2025-2026 market rates, marked "reference price"
- Sample Size: Follows statistical principles, power analysis recommended
- Ethics: Mark IRB/IACUC requirements for human/animal studies
- Timeliness: Flag methods >5 years old for verification
- Code: Must provide complete runnable code for bioinformatics analyses
Dependencies
- DeepReader: Full-text analysis (pre-requisite step)
- academic-paper: If integrating plans into papers
License
CC BY-NC 4.0 — Free for academic use with attribution. No commercial use without permission.
Authors
- Jiacheng Lou (GitHub)
- 🦞 Claw (AI Research Assistant)