name: libeval description: > libeval - RAG evaluation system. Evaluator orchestrates quality assessment using LLM-as-judge patterns. CriteriaEvaluator scores responses against rubrics. RecallEvaluator measures retrieval performance. TraceEvaluator analyzes execution traces. EvalStore persists results. Use for automated quality testing, RAG pipeline evaluation, and agent performance testing

libeval Skill

When to Use

Evaluating RAG agent response quality
Measuring retrieval recall and precision
Running automated quality assessments
Benchmarking agent performance over time

Key Concepts

Evaluator: Main orchestrator that runs test cases through the agent and collects metrics.

CriteriaEvaluator: Uses LLM-as-judge to score responses against defined criteria and rubrics.

RecallEvaluator: Measures how well the retrieval system returns relevant documents.

TraceEvaluator: Analyzes execution traces for performance and correctness.

Usage Patterns

Pattern 1: Run evaluation suite

import { Evaluator } from "@copilot-ld/libeval";

const evaluator = new Evaluator(config);
const results = await evaluator.run(testCases);
console.log(results.summary);

Pattern 2: Criteria-based evaluation

import { CriteriaEvaluator } from "@copilot-ld/libeval";

const criteria = new CriteriaEvaluator(llmClient);
const score = await criteria.evaluate(response, rubric);

Integration

Configured via config/eval.yml. Run via make eval. Uses libllm for LLM-as-judge.

ナビゲーション

Skillsとは？

リンク

libeval