name: brain-in-the-fish description: Universal document evaluation engine — evaluate any document against any criteria using cognitively-modelled AI agents with ontology-grounded scoring version: 0.1.0

Brain in the Fish — MCP Skill Guide

What This Does

Brain in the Fish evaluates documents (essays, policies, contracts, clinical reports, surveys) against evaluation criteria using a panel of AI agents. Each agent's mental state exists as OWL ontology. Scoring is grounded in an Evidence Density Scorer (EDS) that makes hallucination mathematically detectable.

MCP Tools Available

Tool	Purpose	When to Call
`eval_status`	Check server status and session state	First — verify server is running
`eval_ingest`	Ingest a document (PDF/text)	Step 1
`eval_criteria`	Load evaluation framework	Step 2
`eval_align`	Align document sections to criteria	Step 3
`eval_spawn`	Generate evaluator agent panel	Step 4
`eval_scoring_tasks`	Get all scoring prompts for subagents	Step 5
`eval_score_prompt`	Get scoring prompt for one agent/criterion pair	Step 5 (per-task)
`eval_record_score`	Record a score from an agent	Step 6
`eval_debate_status`	Check disagreements and convergence	Step 7
`eval_challenge_prompt`	Get challenge prompt for debate	Step 7 (per-challenge)
`eval_report`	Generate final evaluation report	Step 8
`eval_whatif`	"What if" re-scoring with modified text	Optional

Evaluation Workflow

Quick Mode (deterministic, no subagents needed)

eval_ingest → eval_criteria → eval_align → eval_spawn → eval_report

The server runs evidence scoring internally. eval_report produces a complete evaluation with deterministic scores.

Full Mode (with Claude subagent scoring)

1. eval_ingest(path, intent)
2. eval_criteria(framework_or_intent)
3. eval_align()
4. eval_spawn(intent)
5. eval_scoring_tasks() → get all tasks
6. For each task:
   - Read the scoring prompt
   - Evaluate the document content against the criterion as the agent persona
   - eval_record_score(agent_id, criterion_id, score, justification, evidence, gaps)
7. eval_debate_status() → check for disagreements
8. If disagreements:
   - eval_challenge_prompt(challenger, target, criterion)
   - Generate challenge argument
   - eval_record_score() with revised score
   - Repeat until converged
9. eval_report() → final report

Subagent Dispatch Pattern

When orchestrating with multiple Claude subagents:

Orchestrator reads eval_scoring_tasks()
  → For each agent in the panel:
      Dispatch subagent with system prompt from eval_scoring_tasks
      Subagent receives: persona, criteria, document sections
      Subagent calls eval_record_score with their assessment
  → After all scores recorded:
      Check eval_debate_status
      If disagreements: dispatch challenge subagents
  → eval_report for final output

Scoring Guidelines for Subagents

When scoring as an agent persona:

Read the document content provided in the scoring prompt carefully
Reference the rubric levels — state which level the document meets
Cite specific evidence from the document text (quote directly)
Identify gaps — what's missing that would improve the score
Be the persona — a Subject Expert scores differently from a Writing Specialist
Do not hallucinate — only reference evidence that appears in the provided text
Use the full scale — don't cluster all scores at 6-8. Use 1-10 range appropriately.

Response Format for eval_record_score

{
  "agent_id": "from the scoring task",
  "criterion_id": "from the scoring task",
  "score": 7.5,
  "max_score": 10.0,
  "round": 1,
  "justification": "Detailed justification referencing specific document content and rubric levels. This section meets Level 3 (score range 6-8) because it demonstrates [specific evidence]. To reach Level 4, the document would need [specific improvement].",
  "evidence_used": ["Direct quote from document", "Another quote"],
  "gaps_identified": ["Missing topic X", "No counter-argument for claim Y"]
}

Supported Document Types

Type	Intent Keywords	Framework Auto-Selected
Academic essay	"essay", "mark", "grade", "coursework"	Academic Essay Marking
Policy document	"policy", "green book", "impact assessment"	HM Treasury Green Book
Survey/research	"survey", "methodology", "questionnaire"	Survey Methodology
Contract/legal	"contract", "legal", "compliance"	Contract Review
Clinical/NHS	"nhs", "clinical", "patient", "governance"	NHS Clinical Governance
GCSE English	"gcse", "english language"	GCSE English Language
Generic	anything else	Generic Quality

Architecture Notes

Three ontologies coexist in one Oxigraph triple store: Document, Criteria, Agent
Evidence scorer provides deterministic evidence-grounded scoring baseline
Validation signals (citations, structure, reading level, fallacies, hedging) feed into the scorer as spikes
Epistemic state tracks justified beliefs with empirical/normative/testimonial bases
Philosophical analysis applies Kantian/utilitarian/virtue ethics lenses
Belief dynamics — Maslow needs update based on findings, trust evolves during debate
Cross-evaluation memory persists results for historical comparison
All triples are queryable via SPARQL through the underlying onto_* tools

ナビゲーション

Skillsとは？

リンク

brain-in-the-fish