Evaluation framework patterns for RAG and LLMs, including faithfulness metrics, synthetic dataset generation, and LLM-as-a-judge patterns. Triggers: ragas, deepeval, llm-eval, faithfulness, hallucination-check, synthetic-data.
Skills(SKILL.md)は、AIエージェント(Claude Code、Cursor、Codexなど)に特定の能力を追加するための設定ファイルです。
詳しく見る →Evaluation framework patterns for RAG and LLMs, including faithfulness metrics, synthetic dataset generation, and LLM-as-a-judge patterns. Triggers: ragas, deepeval, llm-eval, faithfulness, hallucination-check, synthetic-data.
Find AILANG vs Python eval gaps and improve prompts/language. Use when user says 'find eval gaps', 'analyze benchmark failures', 'close Python-AILANG gap', or after running evals.
Review diff classification cases to determine if the LLM correctly categorized hunks, identified change type, and told a coherent story.
System for testing multi-agent behavior consistency across prompts, tools, skills, models, and agent configs.
EvalKit is a conversational evaluation framework for AI agents that guides you through creating robust evaluations using the Strands Evals SDK. Through natural conversation, you can plan evaluations, generate test data, execute evaluations, and analyze results.
Use when user references architecture principles, at start of fresh conversation with design work, before creating any requirements/design/implementation documents, or when reviewing for compliance - grounds problem framing and solution making in all 9 principle categories using citation-manager to extract full context
Evaluate RAG systems with hit rate, MRR, faithfulness metrics and compare retrieval strategies. Use when testing retrieval quality, generating evaluation datasets, comparing embeddings or retrievers, A/B testing, or measuring production RAG performance.
Evaluate agent systems with quality gates and LLM-as-judge. Use when you need to measure component quality or implement quality gates. Not for simple unit testing or binary pass/fail checks without nuance.
Instrument evaluation metrics, quality scores, and feedback loops
Use when need explicit quality criteria and scoring scales to evaluate work consistently, compare alternatives objectively, set acceptance thresholds, reduce subjective bias, or when user mentions rubric, scoring criteria, quality standards, evaluation framework, inter-rater reliability, or grade/assess work.
Evaluate TappsCodingAgents framework effectiveness and provide continuous improvement recommendations. Use for analyzing usage patterns, workflow adherence, and code quality metrics.
EVE Online ESI API patterns and zkillboard integration. Covers ESI endpoints, authentication, data models, zkillboard RedisQ listener, caching strategies, and EVE-specific IDs (characters, corporations, alliances, ships, systems, regions). Use when working with killmail data, fetching EVE universe info, or implementing ESI calls.
Comprehensive EVE Online project management and ESI integration toolkit. Use when updating, auditing, or integrating ESI into EVE Online projects like EVE_Rebellion, EVE_Gatekeeper, EVE_Ships, or any EVE-related development. Triggers on project updates, ESI integration, compliance checking, asset management, or multi-project coordination.
Expert in temporal event detection, spatio-temporal clustering (ST-DBSCAN), and photo context understanding. Use for detecting photo events, clustering by time/location, shareability prediction, place recognition, event significance scoring, and life event detection. Activate on 'event detection', 'temporal clustering', 'ST-DBSCAN', 'spatio-temporal', 'shareability prediction', 'place recognition', 'life events', 'photo events', 'temporal diversity'. NOT for individual photo aesthetic quality (use photo-composition-critic), color palette analysis (use color-theory-palette-harmony-expert), face recognition implementation (use photo-content-recognition-curation-expert), or basic EXIF timestamp extraction.
Event-driven architecture patterns with event sourcing, CQRS, and message-driven communication. Use when designing distributed systems, microservices communication, or systems requiring eventual consistency and scalability.
Structure systems around asynchronous, event-based communication to decouple producers and consumers for improved scalability and resilience. Use when building loosely coupled systems with asynchronous message-based communication.
Add new events to Bob The Skull's event-driven architecture. Use when creating new events, event publishers, event handlers, or extending the event system with new event types.
JSONL event stream for external integrations and hooks
Use when generating branded QR codes for ProductTank SF events - speaker LinkedIn profiles, sponsor websites, or Slack join links. Handles single/bulk generation, correct logo mapping, GDrive upload, and mandatory test-scanning.
Create new event scraping scripts for websites. Use when adding a new event source to the Asheville Event Feed. ALWAYS start by detecting the CMS/platform and trying known API endpoints first. Browser scraping is NOT supported (Vercel limitation). Handles API-based, HTML/JSON-LD, and hybrid patterns with comprehensive testing workflows.
Record domain events and dispatch to inbox handlers for side effects, audit trails, and activity feeds. Use when building activity logs, syncing external services, or decoupling event creation from processing. Triggers on event recording, audit trails, activity feeds, or inbox patterns.
Event sourcing patterns and design decisions
Implement event sourcing and CQRS patterns using event stores, aggregates, and projections. Use when building audit trails, temporal queries, or systems requiring full history.
Event-driven design conventions: event envelope, naming, versioning, schema evolution rules, idempotency, ordering/partitioning, retry and dead-letter handling
Implements Clix event tracking (Clix.trackEvent) with consistent naming, safe
AWS EventBridge serverless event bus for event-driven architectures. Use when creating rules, configuring event patterns, setting up scheduled events, integrating with SaaS, or building cross-account event routing.
Use when spec and code diverge - AI analyzes mismatches, recommends update spec vs fix code with reasoning, handles evolution with user control or auto-updates
Review French articles and translate them to English. Use when the user asks to review, check, or translate KB articles.
Search for relevant code snippets, examples, and documentation from billions of GitHub repositories, documentation pages, and Stack Overflow posts. Use this skill when coding tasks require real working code examples, API usage patterns, framework setup instructions, or library implementation details to eliminate hallucinations and provide accurate, token-efficient context.
Retrieve and extract content from URLs with AI-powered summarization and structured data extraction. Use for scraping web pages, extracting specific information, summarizing articles, or crawling websites with subpages.
Use for creating websets, running searches, importing CSV data, managing items, and adding enrichments to extract structured data.
Process CSV data files by cleaning, transforming, and analyzing them. Use this when users need to work with CSV files, clean data, or perform basic data analysis tasks.
- **Name**: Franklin - Orient Task Force - Rolling Presentation
Provides three production-ready ML training examples (sentiment classification, text generation, RedAI trade classifier) with complete training scripts, deployment configs, and datasets. Use when user needs example projects, reference implementations, starter templates, or wants to see working code for sentiment analysis, text generation, or financial trade classification.
Tests marketplace visibility configurations and catalog tiers (preview catalog only)
Analyze mock data and examples for cultural assumptions, understanding what they communicate about who the product is for. Use when reviewing test data, documentation, or seed data.
A skill for generating Excalidraw-format diagrams from natural language descriptions. This skill helps create visual representations of processes, systems, relationships, and ideas without manual draw
Generate architecture diagrams as .excalidraw files from codebase analysis. Use when the user asks to create architecture diagrams, system diagrams, visualize codebase structure, or generate excalidraw files.
**Core Philosophy**: Semantic redesign, not mechanical conversion. Think like both a presentation designer (clarity, accessibility, simplicity) and an artist (creative visual expression, spatial desig
Analyze messy and unstructured Excel files to identify data quality issues, detect format inconsistencies, find missing values, and generate comprehensive analysis reports. Use when Claude needs to work with Excel files (.xlsx, .xls) for data quality assessment, structure analysis, or when users request data auditing, cleaning recommendations, or statistical summaries of spreadsheet data.
Analyze Excel spreadsheet formulas to build dependency DAGs (Directed Acyclic Graphs) and understand calculation chains. This skill should be used when the user wants to reverse-engineer Excel formula dependencies, trace how values are calculated from inputs to outputs, validate formula logic, or create reusable calculators from spreadsheet logic.
Push Excel data back to BIM models. Update parameters, properties, and attributes from structured spreadsheets.
Excel Translation Skill
Measure quality. Descend toward excellence. No binary gates—only vectors.
Hierarchical exception system with HTTP status codes, machine-readable error codes, and structured responses for consistent API error handling across all endpoints.
Guide for creating exceptions using fastapi-problem that are automatically converted to RFC 9457 Problem Details responses.
Use when implementing exchange rate functionality - provides complete patterns for fetching BTC/fiat exchange rates from Coinbase API, caching strategies, conversion utilities, and React hooks for displaying rates in UI
ONLY when user explicitly types /exe-plan. Never auto-trigger on execute, run, or implement.
Deploy a Vibes app to exe.dev VM hosting. Uses nginx on persistent VMs with SSH automation. Supports client-side multi-tenancy via subdomain-based Fireproof database isolation.
Create or resume an execution plan - a design document that a coding agent can follow to deliver a working feature or system change