name: generalization-evaluator description: Cross-domain evaluation to estimate generality and detect blind spots. Use when asked to assess broad capability, compare models across domains, or identify missing skills.

Generalization Evaluator

Use this skill to measure generality across domains and identify weak coverage.

Run: python scripts/run_eval.py --tasks references/task_set.example.json --runner ollama --model qwen3:latest