name: agent-research-aggregator description: Pre-pipeline aggregator that scans AI agent cache directories (.claude, .cursor, .antigravity, .openclaw) or any user-specified directory for experimentation logs, extracts insights and numeric results, and formats them as PaperOrchestra-ready inputs (idea.md + experimental_log.md). TRIGGER when the user says "aggregate my agent logs for paper writing", "extract experiments from my coding agent history", "prepare PaperOrchestra inputs from my cache", "turn my agent logs into a paper", mentions a folder or directory they want to use as the basis for a paper, or wants to run PaperOrchestra but only has scattered agent experiment histories rather than structured inputs. Run this BEFORE paper-orchestra. Also called automatically by paper-orchestra when workspace/inputs/idea.md or workspace/inputs/experimental_log.md are missing.
agent-research-aggregator
Should I run? (decision gate)
Before starting Phase 1, check whether aggregation is actually needed:
| Situation | Action |
|---|---|
workspace/inputs/idea.md and workspace/inputs/experimental_log.md both exist and are non-empty | Skip this skill entirely. Proceed directly to paper-orchestra. |
| Either file is missing or empty, and the user provided a directory path | Run this skill with that directory as --search-roots. |
| Either file is missing or empty, and no directory was provided | Scan cwd and ~ by default; show the discovery summary to the user before continuing. |
| The inputs exist but look thin (e.g. idea.md has < 5 lines, no numeric data in experimental_log.md) | Ask the user whether to supplement with aggregation or proceed as-is. |
The skill is intentionally a pre-pass — it is cheap to skip and should only run when the structured inputs don't already exist.
A pre-processing skill for PaperOrchestra (arXiv:2604.05018). Reads scattered
experimentation artifacts from AI coding-agent cache directories and synthesizes
them into the structured (I, E) input pair the PaperOrchestra pipeline expects.
[.claude/] [.cursor/] [.antigravity/] [.openclaw/]
│ │ │ │
└────────────┴──────────────┴───────────────┘
│
Phase 1: Discovery
(discover_logs.py)
│
discovered_logs.json
│
Phase 2: Extraction
(LLM call per log batch)
│
raw_experiments.json
│
Phase 3: Synthesis
(LLM call — consolidate)
│
synthesis.json
│
Phase 4: Formatting
(format_po_inputs.py)
│
┌────────────┴────────────┐
workspace/inputs/ workspace/ara/
idea.md aggregation_report.md
experimental_log.md discovered_logs.json
raw_experiments.json
synthesis.json
The output drops directly into workspace/inputs/ so the user can immediately
run paper-orchestra on the same workspace.
Inputs
| Parameter | Required | Default | Description |
|---|---|---|---|
--search-roots | no | cwd, ~ | Comma-separated directories to scan for agent caches |
--agents | no | all | Comma-separated subset: claude,cursor,antigravity,openclaw |
--workspace | no | ./workspace | PaperOrchestra workspace root |
--depth | no | 4 | Max directory scan depth (prevents runaway scans on large home dirs) |
--since | no | none | Only include logs modified after this date (ISO 8601: 2025-01-01) |
The user specifies these when invoking the skill, or you may ask them for
--search-roots if the current directory has no detectable agent caches.
Phase 1 — Discovery (deterministic)
Run the discovery script to catalog every relevant log file:
python skills/agent-research-aggregator/scripts/discover_logs.py \
--search-roots <roots> \
--agents <agents> \
--depth <depth> \
--since <since> \
--out workspace/ara/discovered_logs.json
The script exits with code 2 when no --project filter is set (this is
expected on the first run). It prints a "Projects found" list to stdout —
show it to the user immediately.
If no logs are found at all: stop and ask the user to specify
--search-roots or point you at a directory that contains agent cache folders.
Phase 1.5 — Project Selection (mandatory)
A paper can only be written from a single project. You must ask the user which project to use before any LLM processing begins.
- Display the numbered project list from the discovery summary, e.g.:
Projects found: [1] /home/alice/projects/my-rl-experiment (42 files) [2] /home/alice/projects/llm-eval-suite (17 files) [3] /home/alice/projects/old-demo (3 files) - Ask: "Which project should this paper be based on? Please choose a number or paste the project path."
- Do not proceed to Phase 2 until the user has answered.
- Re-run discovery with the chosen project to filter the manifest:
python skills/agent-research-aggregator/scripts/discover_logs.py \
--search-roots <roots> \
--agents <agents> \
--depth <depth> \
--since <since> \
--project "<chosen project path>" \
--out workspace/ara/discovered_logs.json
This overwrites discovered_logs.json so only the selected project's files
remain. The script exits 0 on success.
If the discovery finds only one project: skip the question and inform the
user: "Only one project found: <path>. Using it for the paper." — then
re-run with --project automatically.
If the discovery summary shows irrelevant files after filtering: ask the user whether to include or exclude them before continuing to Phase 2. Err on the side of inclusion — the extraction prompt is conservative.
Phase 2 — Extraction (LLM-assisted)
Process discovered logs in batches (group by agent type; keep batches under ~50 KB of raw text to stay within context limits):
For each batch:
- Read the log files in the batch (the script's
--listoutput tells you which file paths to read). - Apply the extraction prompt from
references/extraction-prompt.mdas your system message. - Pass the raw log text as the user message.
- Collect the structured JSON the LLM returns (see schema in the prompt).
- Append to
workspace/ara/raw_experiments.json.
After all batches:
python skills/agent-research-aggregator/scripts/extract_experiments.py \
--discovered workspace/ara/discovered_logs.json \
--out workspace/ara/raw_experiments.json \
--validate-only
Run this in --validate-only mode to check the combined JSON is well-formed
and meets the minimum schema (experiments array non-empty, each entry has
hypothesis or method or results). Fix any malformed entries before Phase 3.
Phase 3 — Synthesis (LLM-assisted)
Consolidate possibly-redundant experiment records from multiple agent caches into a single coherent research narrative. This is ONE LLM call.
System message: Use references/synthesis-prompt.md verbatim.
User message:
<raw_experiments>
{contents of workspace/ara/raw_experiments.json}
</raw_experiments>
The LLM must return a synthesis.json with keys:
research_question— the overarching question being investigatedhypothesis— the core proposed solution / claimmethod_summary— how the approach works (concise, no data leakage)key_contributions— 2–5 bullet stringsexperimental_setup— datasets, metrics, baselines, implementation notesresults_tables— array of{title, headers[], rows[]}markdown-table objectsqualitative_observations— free-form text blocks (what worked, what didn't, failure modes, ablation insights)iteration_history— ordered list of{iteration_id, change_description, outcome}entries if multiple iterations are detectedopen_questions— questions that remain unanswered in the logs
Save to workspace/ara/synthesis.json.
Note: By this point, the user has already selected a single project in Phase 1.5. The synthesis should represent one coherent research thread. If the LLM still surfaces multiple disconnected research questions, flag this as a data quality warning in the audit report (Phase 5) but do not re-ask for project selection — that decision was made earlier.
Phase 4 — Formatting (deterministic)
Convert synthesis.json into PaperOrchestra input files:
python skills/agent-research-aggregator/scripts/format_po_inputs.py \
--synthesis workspace/ara/synthesis.json \
--out workspace/inputs/
This generates two files:
workspace/inputs/idea.md (Sparse variant)
Follows the PaperOrchestra Sparse Idea format (arXiv:2604.05018, §3.1):
# [Synthesized Research Title]
## Problem
<2–4 sentence problem statement derived from research_question>
## Hypothesis
<hypothesis from synthesis>
## Method
<method_summary from synthesis>
## Key Contributions
<key_contributions as bullet list>
## Open Questions
<open_questions, if any>
workspace/inputs/experimental_log.md
Follows the PaperOrchestra Experimental Log format (App. D.3):
## 1. Experimental Setup
<experimental_setup from synthesis, formatted as prose + sub-bullets>
## 2. Raw Numeric Data
<results_tables converted to GitHub-Flavored Markdown tables>
## 3. Qualitative Observations
<qualitative_observations from synthesis>
### Iteration History
<iteration_history as an ordered narrative, if present>
After running the script, review both files with the user:
- Read
workspace/inputs/idea.mdaloud and ask: "Does this accurately capture your research question and method?" - Read the table headers from
workspace/inputs/experimental_log.mdand ask: "Are these the correct metrics and baselines?"
Revise based on feedback before proceeding to PaperOrchestra.
Phase 5 — Audit Report (deterministic)
python skills/agent-research-aggregator/scripts/format_po_inputs.py \
--synthesis workspace/ara/synthesis.json \
--out workspace/inputs/ \
--report workspace/ara/aggregation_report.md
The --report flag makes the script also write aggregation_report.md, which
contains:
- Number of agent caches scanned, files read, batches processed
- Per-agent breakdown (files found per agent type)
- Experiment records extracted (count, date range)
- Iterations detected (count, convergence direction)
- Data quality warnings (gaps, low-confidence extractions, conflicting numbers)
- Files written and their sizes
Show the report to the user. If the data quality section lists warnings, discuss them before running paper-orchestra — garbage in, garbage out.
Handoff to PaperOrchestra
Once the user has confirmed idea.md and experimental_log.md, the workspace
is ready for the paper-orchestra pipeline. You still need:
| File | Status | Action |
|---|---|---|
workspace/inputs/idea.md | ✓ generated | user review recommended |
workspace/inputs/experimental_log.md | ✓ generated | user review recommended |
workspace/inputs/template.tex | MISSING | ask user to provide their conference LaTeX template |
workspace/inputs/conference_guidelines.md | MISSING | ask user to provide (page limit, deadline, formatting rules) |
Tell the user exactly which two files are still needed, then offer to run
paper-orchestra once they supply them.
Error handling
| Situation | Action |
|---|---|
| Cache directory does not exist | Skip silently; note in report |
| File is binary or non-text | Skip; note in report |
| File > 200 KB | Truncate at 200 KB; note in report with path |
| LLM extraction returns malformed JSON | Re-prompt once with the parse error appended; if still malformed, log the batch as status: failed and continue |
Synthesis returns > 1 research_question | Log as data quality warning in audit report; do not re-ask for project (was selected in Phase 1.5) |
results_tables is empty after synthesis | Warn the user — PaperOrchestra's section-writing agent needs numeric data |
Hard rules (never violate)
- Never write to agent cache directories. This skill is read-only on
.claude/,.cursor/,.antigravity/,.openclaw/. - Never include personal information (emails, names, credentials, API keys) in generated
idea.mdorexperimental_log.md. The extraction prompt instructs the LLM to strip PII; double-check before handoff. - Never fabricate results. If a metric appears in only one log with low confidence, mark it
[UNVERIFIED]in the table rather than silently including it. - Never proceed past Phase 1 without user confirmation of the discovered file list if the scan found > 50 files.
Quick reference
# Phase 1: discover all projects (exits with code 2 — project selection required)
python skills/agent-research-aggregator/scripts/discover_logs.py \
--search-roots . ~ --out workspace/ara/discovered_logs.json
# Phase 1.5: re-run with chosen project (exits 0)
python skills/agent-research-aggregator/scripts/discover_logs.py \
--search-roots . ~ \
--project "/home/user/projects/my-chosen-project" \
--out workspace/ara/discovered_logs.json
# ... (Phase 2: LLM extraction calls, see above) ...
python skills/agent-research-aggregator/scripts/extract_experiments.py \
--discovered workspace/ara/discovered_logs.json \
--out workspace/ara/raw_experiments.json --validate-only
# ... (Phase 3: LLM synthesis call, see above) ...
python skills/agent-research-aggregator/scripts/format_po_inputs.py \
--synthesis workspace/ara/synthesis.json \
--out workspace/inputs/ \
--report workspace/ara/aggregation_report.md