Generate concise summaries of source code at multiple scales. Use when users ask to summarize, explain, or understand code - whether it's a single function, a class, a module, or an entire codebase. Handles function-level code by explaining intention and core logic, and large codebases by providing high-level overviews with drill-down capabilities for specific modules.
Convert code between programming languages while preserving functionality and semantics. Use when: (1) Translating functions, classes, or modules between languages (Python, JavaScript/TypeScript, Java, Go, Rust, C/C++), (2) Migrating entire projects to a different language, (3) Need idiomatic translation that follows target language conventions, (4) Converting between different paradigms (OOP to functional, etc.), (5) Porting legacy code to modern languages. Provides language-specific patterns, idiomatic translation guides, and project migration strategies.
Debug proof failures using counterexamples from Nitpick (Isabelle) or QuickChick (Coq) to identify specification errors, missing preconditions, and proof strategy issues. Use when: (1) A proof attempt fails and you need to understand why, (2) Counterexamples are generated by Nitpick or QuickChick, (3) Specifications may be incorrect or incomplete, (4) Theorems need validation before proving, (5) Missing preconditions or lemmas need identification, or (6) Proof failures need explanation and correction suggestions. Supports both Isabelle/HOL and Coq equally.
Explain why counterexamples violate specifications by analyzing formal specifications (temporal logic, invariants, pre/postconditions, code contracts), informal requirements (user stories, acceptance criteria), test specifications (assertions, property-based tests), and providing step-by-step traces showing state changes, comparing expected vs actual behavior, identifying root causes, and assessing violation impact. Use when debugging test failures, understanding model checker output, explaining runtime assertion violations, analyzing static analysis warnings, or teaching specification concepts. Produces structured markdown explanations with traces, comparisons, state diagrams, and cause chains. Triggers when users ask why something failed, explain a violation, understand a counterexample, debug a specification, or analyze why a test fails.
Generate concrete counterexamples when formal verification, assertions, or specifications fail. Use this skill when debugging failed proofs, understanding why verification fails, creating minimal reproducing examples, analyzing assertion violations, investigating invariant breaks, or diagnosing specification mismatches. Produces concrete input values, execution traces, and state information that demonstrate the failure.
Automatically generates executable test cases from model checking counterexample traces. Translates abstract counterexample states and transitions into concrete test inputs, execution steps, and assertions that reproduce property violations. Use when working with model checker outputs (SPIN, CBMC, NuSMV, TLA+, Java PathFinder, etc.) and needing to create regression tests, validate bug fixes, or reproduce verification failures in executable test suites.
Generate targeted test inputs to reach specific code paths and hard-to-reach behaviors in Python code. Use when: (1) Targeting uncovered branches or specific execution paths, (2) Need coverage-guided test generation, (3) Want to leverage LLM understanding of code semantics for meaningful test inputs, (4) Testing boundary conditions and edge cases systematically, (5) Combining symbolic reasoning with fuzzing. Provides path analysis, constraint solving, coverage-guided strategies, and LLM-driven semantic generation for comprehensive test input creation.
Generate setup scripts and instructions for development environments across platforms. Use when: (1) Setting up new development machines (Python, Node.js, Docker, databases), (2) Creating automated setup scripts for team onboarding, (3) Need cross-platform setup instructions (macOS, Linux, Windows), (4) Installing development tools and dependencies, (5) Configuring version managers and package managers. Provides executable setup scripts, platform-specific guides, and tool installation instructions.
Explains test failures and provides actionable debugging guidance. Use when tests fail (unit, integration, E2E), builds fail, or code throws errors. Analyzes error messages, stack traces, and test output to identify root causes and suggest concrete fixes. Handles pytest, jest, junit, mocha, vitest, selenium, cypress, playwright, and other testing frameworks across Python, JavaScript/TypeScript, Java, Go, and other languages.
Selectively instruments code to capture runtime data for debugging failures and bugs. Use when investigating crashes, exceptions, unexpected behavior, test failures, or performance issues. Analyzes stack traces and error messages to identify suspicious code regions, then adds targeted logging, tracing, and assertions to capture variable values, execution paths, timing, and conditional branches. Supports Python, JavaScript/TypeScript, Java, and C/C++.
Generate formal specifications (definitions, predicates, invariants, pre/post-conditions) in Isabelle/HOL or Coq from informal requirements, source code, pseudocode, or mathematical descriptions. Use when users need to: (1) Formalize algorithms or data structures, (2) Create function specifications with contracts, (3) Generate predicates and properties for verification, (4) Translate informal requirements into formal logic, (5) Specify invariants for loops or data structures, or (6) Create formal definitions for mathematical concepts. Supports both Isabelle/HOL and Coq equally.
Automatically migrate Python web applications between frameworks (Flask → FastAPI, Django → FastAPI). Use when you need to migrate an existing web application to a modern framework while preserving functionality. The skill analyzes the codebase, updates routes, handlers, configuration, dependency injection patterns, and tests. Creates git commits for each migration phase and generates a comprehensive summary of all changes. Supports automatic dependency updates, code transformations, and test adaptations.
Generate complete, production-ready functions and classes from formal specifications, design descriptions, type signatures, or natural language requirements. Use this skill when implementing APIs from specifications, creating data structures from schemas, building classes from UML diagrams, generating code from contracts, or translating design documents into code. Supports multiple programming languages and follows language-specific best practices.
Automatically performs git bisect to identify the first bad commit that introduced a bug or failure. Use when debugging regressions, tracking down when a test started failing, or identifying which commit broke functionality. Handles flaky tests with retry logic and provides comprehensive reports with bisect logs and confidence levels.
Git expert combining atomic commits, rebase/squash, and history search (blame, bisect, log -S). Use for any git operations requiring structured commit strategies, history rewriting, or code archaeology. Triggers: 'commit', 'rebase', 'squash', 'who wrote', 'when was X added', 'find the commit that'.
Extract abstract mathematical models from imperative code (C, C++, Python, Java, etc.) suitable for formal reasoning in Coq. Use when the user asks to model imperative code in Coq, create Coq specifications from imperative programs, extract mathematical models for verification, or translate imperative algorithms to Coq for formal reasoning and proof.
Incrementally implement new features in Java repositories from natural language descriptions. Use when adding functionality to existing Java codebases (Maven or Gradle projects). Takes a feature description as input and outputs modified repository with implementation code, corresponding JUnit tests, and verification that all tests pass. Supports method additions, new class creation, and method modifications with proper Java conventions.
Takes a Python repository and natural language feature description as input, implements the feature with proper code placement, generates comprehensive tests, and ensures all tests pass. Use when Claude needs to: (1) Add new features to existing Python projects, (2) Implement functions, classes, or modules based on requirements, (3) Modify existing code to add functionality, (4) Generate unit and integration tests for new code, (5) Fix failing tests after implementation, (6) Ensure code follows existing patterns and conventions.
Generates hierarchical context files (CLAUDE.md) throughout a project directory tree, providing AI agents with directory-specific knowledge for better code understanding. Use when setting up a new project for AI-assisted development.
Generate integration tests for multiple interacting components in Python. Use when testing interactions between: (1) Multiple services or APIs (REST/GraphQL endpoints, microservices), (2) Database operations with repositories/ORMs (SQLAlchemy, Django ORM), (3) External services (payment gateways, email services, third-party APIs), (4) Message queues and event-driven systems, (5) Full stack workflows (API + database + business logic). Provides test structure templates, fixtures, test data builders, and patterns for pytest-based integration testing.
[TODO: Complete and informative explanation of what the skill does and when to use it. Include WHEN to use this skill - specific scenarios, file types, or tasks that trigger it.]
Analyze differences in program intervals between two versions of a program (old and new) to identify added, removed, or modified intervals. Use when comparing program versions, analyzing variable ranges, detecting behavioral changes in numeric computations, validating refactorings, or assessing migration impacts. Supports optional test suite integration to validate interval changes. Generates comprehensive reports highlighting intervals requiring further testing or verification.
Automatically updates regression tests based on interval analysis to maintain coverage of key program intervals. Use when code changes affect value ranges, conditionals, or control flow, and existing tests need updating to maintain interval coverage. Analyzes interval information from updated code, identifies coverage gaps, adjusts test inputs and assertions, removes redundant tests, and generates new tests for uncovered intervals. Supports Python, Java, JavaScript, and C/C++ with various test frameworks (pytest, JUnit, Jest, Google Test).
Profile programs at the function/method level to identify performance hotspots, bottlenecks, and optimization opportunities. Records execution time, memory usage, and call frequency for each interval. Generates actionable recommendations and visualizations. Use when users need to (1) analyze program performance, (2) identify slow functions or bottlenecks, (3) optimize execution time or memory usage, (4) profile Python, Java, or C/C++ programs with test cases or workload scenarios, or (5) generate performance reports with flame graphs and recommendations.
Automatically generate clear, actionable issue reports from failing tests and repository analysis. Analyze test failures to understand expected vs. actual behavior, identify affected code components, and produce well-structured Markdown reports suitable for GitHub Issues or similar trackers. Use when a test fails, when debugging issues, or when the user asks to create an issue report, generate a bug report, or document a test failure.
Automatically generate regression tests for Java codebases by analyzing changes between old and new code versions. Use when users need to: (1) Generate tests after refactoring or code changes, (2) Ensure previously tested behavior still works in new versions, (3) Cover modified or newly added code paths, (4) Migrate existing tests to work with updated APIs or signatures, (5) Maintain test coverage during code evolution. Supports JUnit and TestNG frameworks with unit tests, parameterized tests, and exception testing patterns.
Update Java test classes and methods to work with new code versions after refactoring or modifications. Use when code changes break existing tests due to signature changes, refactoring, or behavior modifications. Takes old and new code versions plus old tests as input, and outputs updated tests that compile and pass against the new code. Handles method signature changes, class refactoring, assertion updates, and mock modifications.
Recommend relevant Isabelle/HOL or Coq standard library theories, lemmas, and tactics based on proof goals. Use when: (1) Users need library lemmas for their proof, (2) Proof goals match standard library patterns, (3) Users ask what libraries to import, (4) Specific lemmas are needed for list/set/arithmetic operations, (5) Users are stuck and need to know what library support exists, or (6) Guidance on find_theorems/Search commands is needed. Supports both Isabelle/HOL and Coq standard libraries.
Intelligent code refactoring using IDE-level tools (rename, find-references, go-to-definition), AST-aware pattern matching, and TDD verification. Use for safe, large-scale refactoring with precision.
Automatically identify metamorphic properties (symmetry, linearity, additivity, input invariances) from programs or functions. Use when generating metamorphic tests, discovering program properties, validating transformations, or creating test oracles without explicit specifications. Analyzes control flow, data flow, and sample executions to output structured properties for metamorphic test generation and verification.
Generate test cases using metamorphic testing by applying transformations based on metamorphic properties. Use when you need to expand test suites, test programs without oracles, validate mathematical or algorithmic properties, or detect subtle bugs through input-output relationships. The skill takes a program, original test cases, and metamorphic properties as input, generates new test cases by applying transformations, executes tests, verifies outputs satisfy properties, reports violations and anomalies, and outputs an expanded test suite with property coverage summary. Supports multiple programming languages and property types.
Generate unit tests with proper mocking for Python (unittest.mock/pytest) or Java (Mockito/JUnit) code. Use when users request test generation, unit tests with mocks, or testing code that has external dependencies like database calls, API requests, file I/O, or network operations. Automatically identifies dependencies to mock and creates executable, maintainable test code.
Compare behavior across multiple versions of programs or repositories. Use when you need to analyze how functionality changes between versions, identify regressions, compare outputs and exceptions, or validate upgrades. The skill compares execution behavior, test results, outputs, exceptions, and observable states across versions, generating detailed reports showing behavioral divergences, potential regressions, added/removed functionality, and areas requiring validation. Supports multiple programming languages and can work with test suites or execution traces.
Optimize test suites using mutation testing to maximize mutation kill rate with minimal tests. Use when you need to reduce test suite size while maintaining quality, identify redundant tests, improve mutation coverage, or validate test effectiveness. The skill analyzes test coverage, execution intervals, and redundancy using mutation operators, selects or generates a minimal subset of tests that maximizes mutation kill rate, and outputs an optimized test suite with detailed reports showing killed and surviving mutants. Supports multiple programming languages and mutation testing frameworks.
Transforms natural language requirements (user stories, verbal descriptions, business rules) into formal specifications and constraints. Use when converting informal requirements into structured, testable specifications with explicit constraints. Outputs in multiple formats including BDD-style Given-When-Then, JSON Schema, and structured plain text requirements documents.
Browser automation via Playwright for web testing, screenshots, form filling, scraping, and verification. Use when tasks require navigating websites, interacting with web pages, or testing web applications.
Generate Isabelle or Coq proofs establishing partial or total correctness of imperative programs from code and formal specifications. Use when users need to: (1) Prove program correctness using Hoare logic, (2) Generate verification conditions from pre/postconditions, (3) Construct loop invariants and termination arguments, (4) Verify imperative programs with assignments, conditionals, and loops. Supports both partial correctness (if terminates, postcondition holds) and total correctness (terminates and postcondition holds) for both Isabelle/HOL and Coq.
Extract abstract mathematical models from functional code (Haskell, OCaml, F#) for formal reasoning in Isabelle/HOL. Use when users need to: (1) Convert functional programs to Isabelle definitions, (2) Extract high-level algorithm essence from implementation code, (3) Generate formal specifications and properties from code, (4) Create verification-ready models that capture mathematical properties while abstracting away implementation details. Focuses on structural recursion, algebraic data types, higher-order functions, and invariant extraction.
Automatically generate TLA+ specifications from program code, repositories, or system implementations. Use when asked to generate TLA+ spec, create TLA+ specification from code, convert program to TLA+, formalize system in TLA+, extract TLA+ model from code, or when working with formal specification of concurrent systems, distributed systems, protocols, algorithms, or state machines that need to be verified.
Generate executable code together with formal proofs certifying safety and correctness properties in Isabelle/HOL or Coq. Use when building verified software, safety-critical systems, or when formal guarantees are required. Produces code with accompanying proofs for memory safety, bounds checking, functional correctness, invariant preservation, and termination. Supports extraction to OCaml/Haskell/SML and integration with existing codebases.
Extract programming-language-agnostic pseudocode from source code in any language, preserving control flow and logical structure while filtering out implementation details. Use when the user asks to convert code to pseudocode, abstract code logic, understand code structure without syntax, create language-independent documentation, or analyze algorithmic flow without language-specific details.
Converts pseudocode descriptions and algorithm specifications into complete, executable Java code. Use this skill when you need to implement algorithms from pseudocode, translate algorithm descriptions to Java, generate Java code from specifications, convert textbook algorithms to working code, or create executable implementations from high-level descriptions. Preserves logic and control flow while handling Java idioms, data structures, and includes test cases for verification.
Convert pseudocode, algorithm descriptions, or specifications into complete, executable Python code. Handles natural language descriptions, structured pseudocode, and formal algorithm specifications. Generates production-ready code with type hints, docstrings, error handling, and test cases. Use when users need to (1) convert pseudocode to Python, (2) implement algorithms from descriptions, (3) translate algorithm specifications to code, (4) generate Python implementations from textbook pseudocode, or (5) create executable code from high-level algorithm designs.
Automatically generates regression tests for Python codebases by analyzing changes between old and new code versions and their existing tests. Migrates tests to work with new code, generates tests for new functionality, and creates mocks for external dependencies. Supports unittest and pytest frameworks. Use when refactoring code, adding features, or ensuring backward compatibility.
Updates Python test code to work with new versions of the code being tested. Use when Claude needs to: (1) Update tests after code changes, (2) Fix broken tests due to signature changes, (3) Update assertions to match new behavior, (4) Add test cases for new functionality, (5) Analyze code differences and their test impact, (6) Run tests and fix failures based on error messages. Takes old code, new code, and old tests as input, outputs updated tests that pass.
Translate Python programs into equivalent Dafny code, preserving program semantics and ensuring the generated code is well-typed, executable, and verifiable. Use when the user asks to convert Python code to Dafny, port Python programs to Dafny, add formal verification to Python code, or create Dafny versions of Python algorithms with specifications.
Generate systematic refinement steps from high-level specifications to concrete implementations in Isabelle/HOL or Coq, preserving correctness obligations at each step. Use when working with formal verification, program refinement, proof development, or when translating abstract specifications into executable code while maintaining formal guarantees. Supports data refinement (abstract types → concrete structures), algorithmic refinement (specifications → algorithms), and stepwise refinement with proof obligations.