name: code-review:review-pr description: Comprehensive pull request review using specialized agents argument-hint: "[review-aspects] [--min-impact critical|high|medium|medium-low|low]"
Pull Request Review Instructions
Expert code reviewer. Structured, systematic, actionable feedback.
User Input:
$ARGUMENTS
IMPORTANT: Skip spec/ and reports/ folders unless asked.
CRITICAL: Post inline comments only! No overall review report! Each comment must be inline, code-related, produce meaningful value!
Rules
Format: L<line>: <problem>. <fix>. -- or <file>:L<line>: ... for multi-file diffs.
Severity prefix (optional, when mixed):
🔴 bug:-- broken behavior, will cause incident🟡 risk:-- works but fragile (race, missing null check, swallowed error)🔵 nit:-- style, naming, micro-optim. Author can ignore❓ q:-- genuine question, not suggestion
Drop:
- "I noticed that...", "It seems like...", "You might want to consider..."
- "This is just a suggestion but..." -- use
nit:instead - "Great work!", "Looks good overall but..." -- say once at top, not per comment
- Restating what line does -- reviewer can read diff
- Hedging ("perhaps", "maybe", "I think") -- if unsure use
q:
Keep:
- Exact line numbers
- Exact symbol/function/variable names in backticks
- Concrete fix, not "consider refactoring this"
- The why if fix isn't obvious from problem statement
Examples
Bad: "I noticed that on line 42 you're not checking if the user object is null before accessing the email property. This could potentially cause a crash if the user is not found in the database. You might want to add a null check here."
Good: L42: 🔴 bug: user can be null after .find(). Add guard before .email.
Bad: "It looks like this function is doing a lot of things and might benefit from being broken up into smaller functions for readability."
Good: L88-140: 🔵 nit: 50-line fn does 4 things. Extract validate/normalize/persist.
Bad: "Have you considered what happens if the API returns a 429? I think we should probably handle that case."
Good: L23: 🟡 risk: no retry on 429. Wrap in withBackoff(3).
Auto-Clarity
Drop terse mode for: security findings (CVE-class bugs need full explanation + reference), architectural disagreements (need rationale), onboarding contexts where author is new. Write normal paragraph, then resume terse.
Boundaries
Reviews only -- no code fix, no approve/request-changes, no linters. Output comment(s) ready to paste into PR. "stop caveman-review" or "normal mode": revert to verbose.
Command Arguments
Parse from $ARGUMENTS:
Argument Definitions
| Argument | Format | Default | Description |
|---|---|---|---|
review-aspects | Free text | None | Optional focus areas (e.g., "security, performance") |
--min-impact | --min-impact <level> | high | Min impact level for inline comments. Values: critical, high, medium, medium-low, low |
Impact Level Mapping
| Level | Impact Score Range |
|---|---|
critical | 81-100 |
high | 61-80 |
medium | 41-60 |
medium-low | 21-40 |
low | 0-20 |
Configuration Resolution
Parse $ARGUMENTS, resolve config:
# Extract review aspects (free text, everything that is not a flag)
REVIEW_ASPECTS = all non-flag text from $ARGUMENTS
# Parse flags
MIN_IMPACT = --min-impact || "high"
# Resolve minimum impact score from level name
MIN_IMPACT_SCORE = lookup MIN_IMPACT in Impact Level Mapping:
"critical" -> 81
"high" -> 61
"medium" -> 41
"medium-low" -> 21
"low" -> 0
Review Workflow
Comprehensive PR review using multiple specialized agents. Follow steps precisely:
Phase 1: Preparation
Run commands in order:
-
Determine Review Scope
- Check changes (use commands returning line counts, not file content):
git statusgit diff --statgit diff origin/master --statorgit diff origin/master...HEAD --statfor PR diffs- change to
origin/mainif main is default branch
- change to
- Parse
$ARGUMENTSper Command Arguments section to resolveREVIEW_ASPECTS,MIN_IMPACT,MIN_IMPACT_SCORE
- Check changes (use commands returning line counts, not file content):
-
Launch up to 6 parallel Haiku agents:
-
One agent: check if PR (a) closed, (b) draft. If so, stop -- PR not eligible for review.
-
One agent: search for file paths (not contents) of: CLAUDE.md, AGENTS.md, **/consitution.md, root README.md, plus README.md in directories whose files PR modified
-
Split files by line count across 1-4 agents:
GOAL: Analyse PR changes in following files and provide summary Perform following steps: - Run [pass proper git command that he can use] to see changes in files - Analyse following files: [list of files] Please return a detailed summary of the changes in the each file, including types of changes, their complexity, affected classes/functions/variables/etc., and overall description of the changes.
-
-
CRITICAL: If PR missing description, add summary of changes in short/concise format.
Phase 2: Searching for Issues
Determine Applicable Reviews, then launch up to 6 parallel (Sonnet or Opus) agents to review all changes. Agents return list of issues + reason each was flagged (CLAUDE.md adherence, bug, historical context, etc.).
Available Review Agents:
- security-auditor - Security vulnerabilities
- bug-hunter - Bugs and silent failures
- code-quality-reviewer - Project guidelines, maintainability, quality, code simplification
- contracts-reviewer - Type design, invariants, API changes, data modeling
- test-coverage-reviewer - Test coverage quality and completeness
- historical-context-reviewer - Git blame, history, previous PRs touching these files
Default: run all applicable agents.
Determine Applicable Reviews
Based on Phase 1 changes summary and complexity:
- Code or config changes (not purely cosmetic): bug-hunter, security-auditor
- Code changes (business/infra logic, formatting): code-quality-reviewer
- Code or test files changed: test-coverage-reviewer
- Types, API, data modeling changed: contracts-reviewer
- High complexity or historical context needed: historical-context-reviewer
Launch Review Agents
Parallel approach:
- Launch all agents simultaneously
- Provide full list of modified files + PR summary as context, highlight which PR they review, include project guidelines files (README.md, CLAUDE.md, consitution.md if they exist)
- Results come back together
Phase 3: Confidence & Impact Scoring
-
For each Phase 2 issue, launch parallel Haiku agent with PR, issue description, CLAUDE.md files list. Returns TWO scores:
Confidence Score (0-100) - How real is issue (not false positive):
a. 0: Not confident. False positive, doesn't hold up, or pre-existing issue. b. 25: Somewhat confident. Might be real, might be false positive. Agent couldn't verify. If stylistic, not explicitly in CLAUDE.md. c. 50: Moderately confident. Verified real issue, but nitpick or rare in practice. Not very important relative to rest of PR. d. 75: Highly confident. Double-checked, very likely real, will be hit in practice. Existing PR approach insufficient. Important, directly impacts functionality, or directly mentioned in CLAUDE.md. e. 100: Absolutely certain. Double-checked, confirmed real, happens frequently. Evidence directly confirms.
Impact Score (0-100) - Severity if left unfixed:
a. 0-20 (Low): Minor code smell or style inconsistency. No significant functionality/maintainability impact. b. 21-40 (Medium-Low): Code quality issue hurting maintainability/readability, no functional impact. c. 41-60 (Medium): Errors under edge cases, performance degradation, or makes future changes difficult. d. 61-80 (High): Breaks core features, corrupts data under normal usage, or creates significant tech debt. e. 81-100 (Critical): Runtime errors, data loss, system crash, security breaches, or complete feature failure.
For CLAUDE.md-flagged issues, agent must double-check CLAUDE.md actually calls out that issue.
-
Filter using progressive threshold table -- higher impact needs less confidence:
Impact Score Min Confidence Required Rationale 81-100 (Critical) 50 Critical issues warrant investigation even with moderate confidence 61-80 (High) 65 High impact needs good confidence to avoid false alarms 41-60 (Medium) 75 Medium issues need high confidence to justify addressing 21-40 (Medium-Low) 85 Low-medium impact needs very high confidence 0-20 (Low) 95 Minor issues only if nearly certain Filter out issues not meeting min confidence threshold for their impact level. If none meet criteria, do not proceed.
Do NOT post inline comments for:
- Issues below configured
MIN_IMPACTlevel -- impact score belowMIN_IMPACT_SCORE(from--min-impact, default:high/ 61) excluded. - Low confidence issues -- below min confidence threshold for their impact level excluded entirely.
Focus inline comments on issues at or above
MIN_IMPACTlevel meeting confidence thresholds. - Issues below configured
-
Use Haiku agent to re-check PR eligibility from Phase 1 (in case updates since review started).
-
Post Inline Comments Only (skip if no issues):
a. Preferred -- Use MCP GitHub tools if available:
- Use
mcp__github_inline_comment__create_inline_commentfor line-specific feedback per issue.
b. Fallback -- direct API calls:
- Check if
git:attach-review-to-prcommand available. - If available and issues found:
- Multiple Issues:
gh api repos/{owner}/{repo}/pulls/{pr_number}/reviewswith review body + comments array. - Single Issue:
gh api repos/{owner}/{repo}/pulls/{pr_number}/commentsfor one line-specific comment.
- Multiple Issues:
When writing comments:
- Keep brief
- Use emojis
- Link and cite relevant code, files, URLs
- Use
False Positives (Phase 3)
- Pre-existing issues
- Looks like bug but isn't
- Pedantic nitpicks senior engineer wouldn't call out
- Issues linter/typechecker/compiler would catch (imports, type errors, formatting, style). Assume CI runs these separately.
- General code quality issues (coverage, documentation) unless required in CLAUDE.md
- Issues called out in CLAUDE.md but explicitly silenced in code (lint ignore comment)
- Functionality changes likely intentional or related to broader change
- Real issues on lines user didn't modify
Notes:
- Use build, lint, test commands if available to find non-obvious issues
- Use
ghfor Github interaction (fetch PR, create inline comments), not web fetch - Make todo list first
- Cite and link each bug (if referring to CLAUDE.md, link it)
- For line-specific comments via
git:attach-review-to-pr:- Each issue maps to specific file + line number
- Multiple issues:
gh api repos/{owner}/{repo}/pulls/{pr_number}/reviewswith JSON containing review body (Quality Gate summary) + comments array - Single issue:
gh api repos/{owner}/{repo}/pulls/{pr_number}/comments
Template for line-specific review comments
Via git:attach-review-to-pr, use per issue:
🔴/🟠/🟡/🟢 [Critical/High/Medium/Low]: [Brief description]
[Evidence: Explain what code pattern/behavior was observed that indicates this issue and the consequence if left unfixed]
[If applicable, provide code suggestion]:
```suggestion
[code here]
#### Example: Bug Issue
```markdown
🟠 High: Potential null pointer dereference
Variable `user` is accessed without null check after fetching from database. This will cause runtime error if user is not found, breaking the user profile feature.
```suggestion
if (!user) {
throw new Error('User not found');
}
#### Example: Security Issue
```markdown
🔴 Critical: SQL Injection vulnerability
User input is directly concatenated into SQL query without sanitization. Attackers can execute arbitrary SQL commands, leading to data breach or deletion.
Use parameterized queries instead:
```suggestion
db.query('SELECT * FROM users WHERE id = ?', [userId])
### Template for inline comments using GitHub API
#### Multiple Issues (using `/reviews` endpoint)
Via `gh api repos/{owner}/{repo}/pulls/{pr_number}/reviews`, each comment in `comments` array uses line-specific template above.
#### Single Issue (using `/comments` endpoint)
Via `gh api repos/{owner}/{repo}/pulls/{pr_number}/comments`, post one line-specific comment using template above.
**Linking to code:**
- Use full git sha + line range, eg. `https://github.com/owner/repo/blob/1d54823877c4de72b2316a64032a54afc404e619/README.md#L13-L17`
- Line range format: `L[start]-L[end]`
- Provide at least 1 line of context before and after
**Evaluation:**
- **Security First**: High or Critical security issue automatically becomes blocker
- **Quantify Everything**: Use numbers, not "some", "many", "few"
- **Skip Trivial Issues** in large PRs (>500 lines): Focus on architectural and security issues
#### No Issues Found
Do not post comments. Report to user no issues found.
## Remember
Goal: catch bugs and security issues, improve code quality while maintaining velocity. Be thorough but pragmatic. Focus on code safety and maintainability, not perfection.