name: llm-evaluation author: JM Labs (Javier Montaño) version: 1.0.0 description: > Model output quality assessment, hallucination detection, benchmark suites. [EXPLICIT] Trigger: "llm evaluation" allowed-tools:

Read
Write
Glob
Grep
Bash

Llm Evaluation

"Method over hacks."

TL;DR

Model output quality assessment, hallucination detection, benchmark suites. [EXPLICIT]

Procedure

Step 1: Discover

Gather context and requirements

Step 2: Analyze

Evaluate options per Constitution XIII/XIV

Step 3: Execute

Implement with evidence tags

Step 4: Validate

Verify quality criteria met

Quality Criteria

Evidence tags applied
Constitution-compliant
Actionable output

Usage

Example invocations:

"/llm-evaluation" — Run the full llm evaluation workflow
"llm evaluation on this project" — Apply to current context

Assumptions & Limits

Assumes access to project artifacts (code, docs, configs) [EXPLICIT]
Requires English-language output unless otherwise specified [EXPLICIT]
Does not replace domain expert judgment for final decisions [EXPLICIT]

Edge Cases

Scenario	Handling
Empty or minimal input	Request clarification before proceeding
Conflicting requirements	Flag conflicts explicitly, propose resolution
Out-of-scope request	Redirect to appropriate skill or escalate

ナビゲーション

Skillsとは？

リンク

llm-evaluation