Generate concise summaries of source code at multiple scales. Use when users ask to summarize, explain, or understand code - whether it's a single function, a class, a module, or an entire codebase. Handles function-level code by explaining intention and core logic, and large codebases by providing high-level overviews with drill-down capabilities for specific modules.
Convert code between programming languages while preserving functionality and semantics. Use when: (1) Translating functions, classes, or modules between languages (Python, JavaScript/TypeScript, Java, Go, Rust, C/C++), (2) Migrating entire projects to a different language, (3) Need idiomatic translation that follows target language conventions, (4) Converting between different paradigms (OOP to functional, etc.), (5) Porting legacy code to modern languages. Provides language-specific patterns, idiomatic translation guides, and project migration strategies.
Debug proof failures using counterexamples from Nitpick (Isabelle) or QuickChick (Coq) to identify specification errors, missing preconditions, and proof strategy issues. Use when: (1) A proof attempt fails and you need to understand why, (2) Counterexamples are generated by Nitpick or QuickChick, (3) Specifications may be incorrect or incomplete, (4) Theorems need validation before proving, (5) Missing preconditions or lemmas need identification, or (6) Proof failures need explanation and correction suggestions. Supports both Isabelle/HOL and Coq equally.
Explain why counterexamples violate specifications by analyzing formal specifications (temporal logic, invariants, pre/postconditions, code contracts), informal requirements (user stories, acceptance criteria), test specifications (assertions, property-based tests), and providing step-by-step traces showing state changes, comparing expected vs actual behavior, identifying root causes, and assessing violation impact. Use when debugging test failures, understanding model checker output, explaining runtime assertion violations, analyzing static analysis warnings, or teaching specification concepts. Produces structured markdown explanations with traces, comparisons, state diagrams, and cause chains. Triggers when users ask why something failed, explain a violation, understand a counterexample, debug a specification, or analyze why a test fails.
Generate concrete counterexamples when formal verification, assertions, or specifications fail. Use this skill when debugging failed proofs, understanding why verification fails, creating minimal reproducing examples, analyzing assertion violations, investigating invariant breaks, or diagnosing specification mismatches. Produces concrete input values, execution traces, and state information that demonstrate the failure.
Automatically generates executable test cases from model checking counterexample traces. Translates abstract counterexample states and transitions into concrete test inputs, execution steps, and assertions that reproduce property violations. Use when working with model checker outputs (SPIN, CBMC, NuSMV, TLA+, Java PathFinder, etc.) and needing to create regression tests, validate bug fixes, or reproduce verification failures in executable test suites.
Generate targeted test inputs to reach specific code paths and hard-to-reach behaviors in Python code. Use when: (1) Targeting uncovered branches or specific execution paths, (2) Need coverage-guided test generation, (3) Want to leverage LLM understanding of code semantics for meaningful test inputs, (4) Testing boundary conditions and edge cases systematically, (5) Combining symbolic reasoning with fuzzing. Provides path analysis, constraint solving, coverage-guided strategies, and LLM-driven semantic generation for comprehensive test input creation.
Automatically identify potential boundary and exception cases from requirements, specifications, or existing code, and generate comprehensive test cases targeting boundary conditions, edge cases, and uncommon scenarios. Use this skill when analyzing programs, code repositories, functions, or APIs to discover and test corner cases, null handling, overflow conditions, empty inputs, concurrent access patterns, and other exceptional scenarios that are often missed in standard testing.
Generate setup scripts and instructions for development environments across platforms. Use when: (1) Setting up new development machines (Python, Node.js, Docker, databases), (2) Creating automated setup scripts for team onboarding, (3) Need cross-platform setup instructions (macOS, Linux, Windows), (4) Installing development tools and dependencies, (5) Configuring version managers and package managers. Provides executable setup scripts, platform-specific guides, and tool installation instructions.
Explains test failures and provides actionable debugging guidance. Use when tests fail (unit, integration, E2E), builds fail, or code throws errors. Analyzes error messages, stack traces, and test output to identify root causes and suggest concrete fixes. Handles pytest, jest, junit, mocha, vitest, selenium, cypress, playwright, and other testing frameworks across Python, JavaScript/TypeScript, Java, Go, and other languages.
Selectively instruments code to capture runtime data for debugging failures and bugs. Use when investigating crashes, exceptions, unexpected behavior, test failures, or performance issues. Analyzes stack traces and error messages to identify suspicious code regions, then adds targeted logging, tracing, and assertions to capture variable values, execution paths, timing, and conditional branches. Supports Python, JavaScript/TypeScript, Java, and C/C++.
Generate formal specifications (definitions, predicates, invariants, pre/post-conditions) in Isabelle/HOL or Coq from informal requirements, source code, pseudocode, or mathematical descriptions. Use when users need to: (1) Formalize algorithms or data structures, (2) Create function specifications with contracts, (3) Generate predicates and properties for verification, (4) Translate informal requirements into formal logic, (5) Specify invariants for loops or data structures, or (6) Create formal definitions for mathematical concepts. Supports both Isabelle/HOL and Coq equally.
Automatically migrate Python web applications between frameworks (Flask → FastAPI, Django → FastAPI). Use when you need to migrate an existing web application to a modern framework while preserving functionality. The skill analyzes the codebase, updates routes, handlers, configuration, dependency injection patterns, and tests. Creates git commits for each migration phase and generates a comprehensive summary of all changes. Supports automatic dependency updates, code transformations, and test adaptations.
Generate complete, production-ready functions and classes from formal specifications, design descriptions, type signatures, or natural language requirements. Use this skill when implementing APIs from specifications, creating data structures from schemas, building classes from UML diagrams, generating code from contracts, or translating design documents into code. Supports multiple programming languages and follows language-specific best practices.
Generate randomized and edge-case inputs to detect unexpected failures, bugs, and security vulnerabilities through fuzz testing. Use when creating test cases for robustness testing, generating adversarial inputs, testing error handling, finding edge cases, or security testing. Produces Python test code with fuzzing inputs for strings, numbers, and structured data focusing on edge cases, invalid inputs, and random valid inputs. Triggers when users ask to generate fuzz tests, create randomized test inputs, test edge cases, find bugs through fuzzing, or generate adversarial test cases.
Git expert combining atomic commits, rebase/squash, and history search (blame, bisect, log -S). Use for any git operations requiring structured commit strategies, history rewriting, or code archaeology. Triggers: 'commit', 'rebase', 'squash', 'who wrote', 'when was X added', 'find the commit that'.
Extract abstract mathematical models from imperative code (C, C++, Python, Java, etc.) suitable for formal reasoning in Coq. Use when the user asks to model imperative code in Coq, create Coq specifications from imperative programs, extract mathematical models for verification, or translate imperative algorithms to Coq for formal reasoning and proof.
Incrementally implement new features in Java repositories from natural language descriptions. Use when adding functionality to existing Java codebases (Maven or Gradle projects). Takes a feature description as input and outputs modified repository with implementation code, corresponding JUnit tests, and verification that all tests pass. Supports method additions, new class creation, and method modifications with proper Java conventions.
Takes a Python repository and natural language feature description as input, implements the feature with proper code placement, generates comprehensive tests, and ensures all tests pass. Use when Claude needs to: (1) Add new features to existing Python projects, (2) Implement functions, classes, or modules based on requirements, (3) Modify existing code to add functionality, (4) Generate unit and integration tests for new code, (5) Fix failing tests after implementation, (6) Ensure code follows existing patterns and conventions.
[TODO: Complete and informative explanation of what the skill does and when to use it. Include WHEN to use this skill - specific scenarios, file types, or tasks that trigger it.]
Analyze differences in program intervals between two versions of a program (old and new) to identify added, removed, or modified intervals. Use when comparing program versions, analyzing variable ranges, detecting behavioral changes in numeric computations, validating refactorings, or assessing migration impacts. Supports optional test suite integration to validate interval changes. Generates comprehensive reports highlighting intervals requiring further testing or verification.
Automatically updates regression tests based on interval analysis to maintain coverage of key program intervals. Use when code changes affect value ranges, conditionals, or control flow, and existing tests need updating to maintain interval coverage. Analyzes interval information from updated code, identifies coverage gaps, adjusts test inputs and assertions, removes redundant tests, and generates new tests for uncovered intervals. Supports Python, Java, JavaScript, and C/C++ with various test frameworks (pytest, JUnit, Jest, Google Test).
Profile programs at the function/method level to identify performance hotspots, bottlenecks, and optimization opportunities. Records execution time, memory usage, and call frequency for each interval. Generates actionable recommendations and visualizations. Use when users need to (1) analyze program performance, (2) identify slow functions or bottlenecks, (3) optimize execution time or memory usage, (4) profile Python, Java, or C/C++ programs with test cases or workload scenarios, or (5) generate performance reports with flame graphs and recommendations.
Automatically generate clear, actionable issue reports from failing tests and repository analysis. Analyze test failures to understand expected vs. actual behavior, identify affected code components, and produce well-structured Markdown reports suitable for GitHub Issues or similar trackers. Use when a test fails, when debugging issues, or when the user asks to create an issue report, generate a bug report, or document a test failure.
Update Java test classes and methods to work with new code versions after refactoring or modifications. Use when code changes break existing tests due to signature changes, refactoring, or behavior modifications. Takes old and new code versions plus old tests as input, and outputs updated tests that compile and pass against the new code. Handles method signature changes, class refactoring, assertion updates, and mock modifications.
Recommend relevant Isabelle/HOL or Coq standard library theories, lemmas, and tactics based on proof goals. Use when: (1) Users need library lemmas for their proof, (2) Proof goals match standard library patterns, (3) Users ask what libraries to import, (4) Specific lemmas are needed for list/set/arithmetic operations, (5) Users are stuck and need to know what library support exists, or (6) Guidance on find_theorems/Search commands is needed. Supports both Isabelle/HOL and Coq standard libraries.
Generate test cases using metamorphic testing by applying transformations based on metamorphic properties. Use when you need to expand test suites, test programs without oracles, validate mathematical or algorithmic properties, or detect subtle bugs through input-output relationships. The skill takes a program, original test cases, and metamorphic properties as input, generates new test cases by applying transformations, executes tests, verifies outputs satisfy properties, reports violations and anomalies, and outputs an expanded test suite with property coverage summary. Supports multiple programming languages and property types.
Generate unit tests with proper mocking for Python (unittest.mock/pytest) or Java (Mockito/JUnit) code. Use when users request test generation, unit tests with mocks, or testing code that has external dependencies like database calls, API requests, file I/O, or network operations. Automatically identifies dependencies to mock and creates executable, maintainable test code.
Transforms natural language requirements (user stories, verbal descriptions, business rules) into formal specifications and constraints. Use when converting informal requirements into structured, testable specifications with explicit constraints. Outputs in multiple formats including BDD-style Given-When-Then, JSON Schema, and structured plain text requirements documents.
Automatically generate TLA+ specifications from program code, repositories, or system implementations. Use when asked to generate TLA+ spec, create TLA+ specification from code, convert program to TLA+, formalize system in TLA+, extract TLA+ model from code, or when working with formal specification of concurrent systems, distributed systems, protocols, algorithms, or state machines that need to be verified.
Generate executable code together with formal proofs certifying safety and correctness properties in Isabelle/HOL or Coq. Use when building verified software, safety-critical systems, or when formal guarantees are required. Produces code with accompanying proofs for memory safety, bounds checking, functional correctness, invariant preservation, and termination. Supports extraction to OCaml/Haskell/SML and integration with existing codebases.
Extract programming-language-agnostic pseudocode from source code in any language, preserving control flow and logical structure while filtering out implementation details. Use when the user asks to convert code to pseudocode, abstract code logic, understand code structure without syntax, create language-independent documentation, or analyze algorithmic flow without language-specific details.
Converts pseudocode descriptions and algorithm specifications into complete, executable Java code. Use this skill when you need to implement algorithms from pseudocode, translate algorithm descriptions to Java, generate Java code from specifications, convert textbook algorithms to working code, or create executable implementations from high-level descriptions. Preserves logic and control flow while handling Java idioms, data structures, and includes test cases for verification.
Convert pseudocode, algorithm descriptions, or specifications into complete, executable Python code. Handles natural language descriptions, structured pseudocode, and formal algorithm specifications. Generates production-ready code with type hints, docstrings, error handling, and test cases. Use when users need to (1) convert pseudocode to Python, (2) implement algorithms from descriptions, (3) translate algorithm specifications to code, (4) generate Python implementations from textbook pseudocode, or (5) create executable code from high-level algorithm designs.
Automatically generates regression tests for Python codebases by analyzing changes between old and new code versions and their existing tests. Migrates tests to work with new code, generates tests for new functionality, and creates mocks for external dependencies. Supports unittest and pytest frameworks. Use when refactoring code, adding features, or ensuring backward compatibility.
Updates Python test code to work with new versions of the code being tested. Use when Claude needs to: (1) Update tests after code changes, (2) Fix broken tests due to signature changes, (3) Update assertions to match new behavior, (4) Add test cases for new functionality, (5) Analyze code differences and their test impact, (6) Run tests and fix failures based on error messages. Takes old code, new code, and old tests as input, outputs updated tests that pass.
Translate Python programs into equivalent Dafny code, preserving program semantics and ensuring the generated code is well-typed, executable, and verifiable. Use when the user asks to convert Python code to Dafny, port Python programs to Dafny, add formal verification to Python code, or create Dafny versions of Python algorithms with specifications.
Generate systematic refinement steps from high-level specifications to concrete implementations in Isabelle/HOL or Coq, preserving correctness obligations at each step. Use when working with formal verification, program refinement, proof development, or when translating abstract specifications into executable code while maintaining formal guarantees. Supports data refinement (abstract types → concrete structures), algorithmic refinement (specifications → algorithms), and stepwise refinement with proof obligations.
Checks whether a new version of a repository preserves the behavior observed by tests on the old version. Use this skill when comparing two versions of code to detect regressions, verify refactoring safety, validate bug fixes don't break existing functionality, or ensure backward compatibility. Detects differences in function outputs, exceptions, observable states, and performance between versions. Generates reports highlighting potential regressions (critical, high, medium, low severity), improvements, and areas requiring verification. Triggers when users ask to check for regressions between versions, compare test behavior across versions, verify behavior preservation, or validate that changes don't break existing tests.
Locate root causes of failing regression tests by analyzing code changes, error messages, and test dependencies. Use when regression tests start failing after code changes, investigating test failures in CI/CD, debugging flaky tests, or understanding why previously passing tests now fail. Analyzes git diffs, stack traces, test output, and dependency changes to produce structured markdown reports ranking likely causes. Triggers when users ask to find why tests are failing, debug regression failures, investigate test breakage, or analyze failing test suites.
Compares HEAD with the latest published version to analyze real changes, group by type, and recommend version bumps. Use before publishing a release to understand what actually changed.
Generates comprehensive test scenarios from requirements including BDD/Gherkin scenarios, unit tests, integration tests, and end-to-end test cases. Use when converting requirements, user stories, or specifications into testable scenarios with full coverage including happy paths, error cases, edge cases, and boundary conditions. Outputs structured test suites ready for implementation.
Compares old and new requirement documents, analyzes code repository impact, and generates detailed modification plans. Use when Claude needs to: (1) Compare requirement versions and identify changes, (2) Map requirement changes to code components, (3) Identify components to modify, delete, or add, (4) Analyze dependencies and integration points, (5) Assess test impact, (6) Generate comprehensive modification plans in Markdown format. Supports text/Markdown requirements and analyzes feature-level, functional, and API changes.
Iteratively enhance user requirements into clear, complete, actionable specifications through analysis and clarification. Use when: (1) Users provide initial requirements that need refinement, (2) Requirements are vague, incomplete, or ambiguous, (3) Creating formal specifications from informal descriptions, (4) Identifying missing constraints, edge cases, or acceptance criteria, (5) Clarifying assumptions and implicit dependencies, or (6) Preparing requirements for design, implementation, or verification. Operates interactively: analyze → ask clarifying questions → refine based on responses.
Generate concise, structured summaries of requirements for quick team understanding. Use when analyzing requirements from text documents (MD, TXT, DOCX) or technical specifications to create bullet-point summaries that highlight core functionality and dependencies/constraints. Ideal for sprint planning, stakeholder updates, team onboarding, or any situation requiring rapid comprehension of requirement documents.
Automatically derives TLA+ properties (invariants, safety, liveness) from natural-language requirements or structured requirement documents. Resolves ambiguities, asks clarifying questions for underspecified requirements, and outputs TLA+-compatible property definitions with semantic explanations. Use when translating system requirements, specifications, or behavioral constraints into formal TLA+ temporal logic properties for verification with TLC model checker.
Automatically infer formal correctness properties from Verilog/SystemVerilog RTL code and generate SystemVerilog Assertions (SVA). Identifies control-flow invariants (mutual exclusion, valid-ready handshakes, pipeline ordering, safety properties), liveness expectations, and temporal properties. Use when working with RTL designs that need formal property generation, when adding assertions to existing RTL, or when users ask to infer properties, generate assertions, or create formal specifications from hardware designs.
Check behavioral consistency between high-level hardware specifications and RTL implementations. Use when asked to check RTL consistency, verify RTL against spec, check hardware specification compliance, validate RTL implementation, find spec violations in RTL, check behavioral consistency, or when working with hardware designs that need verification against protocol specifications, timing requirements, or functional specifications in Verilog, VHDL, or SystemVerilog.