QAgent - AGENTS.md
AI Agent Guide: This file is the primary reference for AI coding agents working on QAgent. Read this before starting any work.
Project Overview
QAgent is a self-healing QA agent that automatically tests web applications, identifies bugs, applies fixes, and verifies the fixes – all without human intervention. It creates a closed-loop system that iterates until all tests pass.
The QAgent Loop
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ TESTER │───▶│ TRIAGE │───▶│ FIXER │───▶│ VERIFIER │
│ Agent │ │ Agent │ │ Agent │ │ Agent │
└──────────┘ └──────────┘ └──────────┘ └──────────┘
│ │
│ ┌──────────────┐ │
│ │ Redis │◀───────────────────┘
│ │ (Knowledge │
│ │ Base) │
│ └──────────────┘
│ │
▼ ▼
┌─────────────────────────────────────────────────────────┐
│ W&B Weave (Observability) │
└─────────────────────────────────────────────────────────┘
- Tester Agent runs E2E tests using Browserbase + Stagehand
- Triage Agent diagnoses failures and queries the knowledge base
- Fixer Agent generates code patches using LLM + past fix patterns
- Verifier Agent applies patches, deploys via Vercel, and re-runs tests
- Knowledge Base (Redis) stores successful fixes for future reference
Technology Stack
| Layer | Technology | Purpose |
|---|---|---|
| Frontend | Next.js 14 (App Router), React 18, TypeScript | Demo app and dashboard UI |
| Styling | Tailwind CSS, Radix UI | Component styling and UI primitives |
| Browser Automation | Browserbase + Stagehand | AI-powered E2E testing |
| Deployment | Vercel | Instant deployment after fixes |
| Vector Memory | Redis Stack (with vector search) | Store failure traces and enable semantic lookup |
| Observability | W&B Weave | Trace agent runs, log metrics, evaluate improvements |
| Dashboard | Marimo | Interactive analytics and live visualization |
| LLM | OpenAI / Google Gemini / Anthropic | Patch generation and diagnosis |
| Authentication | GitHub OAuth | Dashboard access control |
| Mobile | React Native (Expo) | Mobile companion app |
Project Structure
QAgent/
├── .claude/
│ └── skills/ # Domain-specific knowledge modules
│ ├── browserbase-stagehand/ # Browser automation patterns
│ ├── redis-vectorstore/ # Vector embeddings, semantic search
│ ├── vercel-deployment/ # Programmatic deployments
│ ├── wandb-weave/ # Tracing and evaluation
│ ├── google-adk/ # ADK/A2A integration patterns
│ ├── marimo-dashboards/ # Reactive notebooks
│ └── qagent-agents/ # Agent implementation patterns
├── agents/ # Agent implementations
│ ├── analyzer/ # Run analysis and summarization
│ ├── crawler/ # Autonomous crawl and discovery flows
│ ├── tester/ # E2E test execution with Stagehand
│ ├── triage/ # Failure diagnosis and root cause analysis
│ ├── fixer/ # LLM-powered patch generation
│ ├── verifier/ # Patch application and deployment
│ ├── orchestrator/ # Workflow coordination (main entry point)
│ └── adk/ # ADK workflow & agents (planned integration)
├── app/ # Next.js App Router
│ ├── api/ # API routes (auth, runs, patches, tests, webhooks)
│ ├── dashboard/ # Dashboard UI pages
│ └── layout.tsx # App shell and metadata
├── components/ # React components
│ ├── dashboard/ # Dashboard-specific components
│ ├── diagnostics/ # Diagnostic views
│ ├── monitoring/ # Monitoring components
│ ├── onboarding/ # First-run guidance and setup
│ ├── patches/ # Patch management UI
│ ├── runs/ # Run tracking components
│ ├── ui/ # Shared UI components (shadcn/ui style)
│ └── voice/ # Voice interface components
├── lib/ # Shared libraries
│ ├── auth/ # Authentication utilities (GitHub OAuth)
│ ├── browserbase/ # Browser automation utilities
│ ├── dashboard/ # Dashboard data helpers
│ ├── git/ # Local git workflow helpers
│ ├── github/ # GitHub API integration
│ ├── hooks/ # React hooks
│ ├── notifications/ # Toasts and notification helpers
│ ├── providers/ # React providers
│ ├── queue/ # Job queue processing
│ ├── redis/ # Redis vector store client
│ ├── redteam/ # Adversarial testing suite
│ ├── tracetriage/ # Trace analysis and self-improvement
│ ├── utils/ # Shared utilities
│ └── weave/ # W&B Weave logging and tracing
├── mobile/ # React Native mobile app
├── dashboard/ # Marimo analytics dashboard (app.py)
├── docs/ # Documentation
│ ├── PRD.md # Product Requirements Document
│ ├── DESIGN.md # System design and data structures
│ ├── ARCHITECTURE.md # Architecture Decision Records
│ ├── DEMO_SCRIPT.md # 3-minute demo script
│ └── SPONSOR_INTEGRATIONS.md # Sponsor integration details
├── prompts/ # Agent prompts
├── scripts/ # Build/deploy helper scripts
├── tests/
│ ├── e2e/ # E2E test specs and runner
│ └── unit/ # Vitest unit tests
├── middleware.ts # Next.js auth middleware
└── .env.example # Environment variable template
Build, Test, and Development Commands
# Install dependencies
pnpm install
# Development server (demo app)
pnpm dev # Starts Next.js dev server on localhost:3000
# Agent workflow
pnpm run agent # Start the QAgent orchestrator
# Testing
pnpm test # Run unit tests with Vitest
pnpm run test:e2e # Execute E2E flows via tests/e2e/runner.ts
# Code quality
pnpm lint # Run ESLint + TypeScript type-check
pnpm format # Format with Prettier
pnpm format:check # Check formatting without modifying files
# Production
pnpm build # Build for production
pnpm start # Start production server
# Dashboard
pnpm dashboard # Launch Marimo dashboard
# Redis
pnpm redis:init # Initialize Redis schema
Configuration Files
| File | Purpose |
|---|---|
package.json | pnpm workspace configuration, scripts, dependencies |
tsconfig.json | TypeScript compiler options (strict mode, path aliases) |
next.config.js | Next.js configuration (React StrictMode) |
tailwind.config.js | Tailwind CSS theme, colors, animations |
vitest.config.ts | Vitest test configuration |
.eslintrc.json | ESLint rules (extends next/core-web-vitals) |
.prettierrc | Prettier formatting rules |
middleware.ts | Next.js auth middleware (GitHub OAuth session validation) |
Coding Style & Naming Conventions
-
Formatter: Prettier is the source of truth
tabWidth: 2singleQuote: truesemi: truetrailingComma: es5printWidth: 100
-
TypeScript: Strict mode enabled
- Avoid
anyunless absolutely justified - Use explicit return types for public methods
- Prefer interfaces over types for object shapes
- Avoid
-
Naming:
- PascalCase for components, classes, interfaces
- camelCase for variables, functions, methods
- UPPER_SNAKE_CASE for constants
- kebab-case for file names
-
File Organization:
- One class per file for agents
- Co-locate related types in
lib/types.ts - Use path aliases (
@/) for imports
Testing Guidelines
Unit Tests
- Location:
tests/unit/ - Framework: Vitest
- Pattern:
*.test.ts - Run:
pnpm test - Coverage: Configured for
agents/**/*.tsandlib/**/*.ts
E2E Tests
- Location:
tests/e2e/ - Test specs:
tests/e2e/specs.ts - Runner:
tests/e2e/runner.ts - Run:
pnpm run test:e2e - Framework: Stagehand (AI-powered browser automation)
Environment Variables
Copy .env.example to .env.local and fill in required values:
Required for Core Functionality
| Variable | Description |
|---|---|
BROWSERBASE_API_KEY | Browserbase API key for browser automation |
BROWSERBASE_PROJECT_ID | Browserbase project identifier |
OPENAI_API_KEY | OpenAI API key for LLM patch generation |
REDIS_URL | Redis connection string (local or Redis Cloud) |
VERCEL_TOKEN | Vercel API token for deployments |
VERCEL_PROJECT_ID | Vercel project identifier |
WANDB_API_KEY | Weights & Biases API key for Weave |
Required for Dashboard
| Variable | Description |
|---|---|
GITHUB_CLIENT_ID | GitHub OAuth App client ID |
GITHUB_CLIENT_SECRET | GitHub OAuth App client secret |
SESSION_SECRET | Session encryption key (generate with openssl rand -hex 32) |
Optional
| Variable | Description |
|---|---|
ANTHROPIC_API_KEY | Anthropic API key (backup LLM) |
GOOGLE_API_KEY | Google API key for Gemini models |
GITHUB_TOKEN | GitHub token for code operations |
DATABASE_URL | PostgreSQL connection string |
SLACK_BOT_TOKEN | Slack notifications |
LINEAR_API_KEY | Linear issue tracking |
Security Note: Never commit .env.local to version control.
Agent Architecture
Tester Agent (agents/tester/)
- Executes E2E tests using Stagehand + Browserbase
- Captures screenshots, DOM snapshots, console logs on failure
- Generates structured
FailureReportobjects - Instrumented with W&B Weave for observability
Triage Agent (agents/triage/)
- Classifies failures:
UI_BUG,BACKEND_ERROR,DATA_ERROR,TEST_FLAKY,UNKNOWN - Localizes bugs to file/line using error patterns + LLM
- Queries Redis for similar past issues
- Generates
DiagnosisReportwith root cause analysis
Fixer Agent (agents/fixer/)
- Generates minimal, targeted code patches
- Uses LLM with few-shot examples from knowledge base
- Validates patches for safety and syntax
- Produces unified diff format
Verifier Agent (agents/verifier/)
- Applies patches to filesystem
- Creates backups and handles rollback
- Validates TypeScript/JSX syntax
- Deploys to Vercel and re-runs tests
- Records successful fixes in Redis
Orchestrator (agents/orchestrator/)
- Coordinates the full QAgent loop
- Handles iteration limits and failure recovery
- Logs metrics to Weave
- Entry point:
pnpm run agent
Development Workflow (Ralph Loop)
Follow this iterative workflow for development:
- Read - Load
AGENTS.md,CLAUDE.md,GEMINI.md, and relevant skills - Analyze - Understand current phase requirements
- Plan - Break down into small, testable increments
- Execute - Implement one increment at a time
- Validate - Test, lint, verify acceptance criteria
- Loop - Update documentation as needed, commit, and return to step 1
Security & Safety Guidelines
Always
- Keep secrets out of version control
- Validate all patches for dangerous patterns (
eval,exec,rm -rf) - Use parameterized queries for database access
- Sanitize user inputs in RedTeam tests
- Verify GitHub webhook signatures
Never
- Hardcode secrets or credentials
- Deploy untested patches to production
- Skip Redis lookup results when available
- Ignore Weave logging for agent runs
- Commit broken code
Key Files for AI Agents
| File | Purpose |
|---|---|
AGENTS.md | Primary repo guide for coding agents |
CLAUDE.md | Detailed tech stack, phase roadmap, always/never rules |
GEMINI.md | Compact project context for Gemini CLI |
lib/types.ts | All TypeScript interfaces and types |
prompts/ralph-loop.md | Development workflow prompts |
.claude/skills/ | Domain-specific implementation guides |
Dependencies
Production
next- Next.js framework@browserbasehq/stagehand- AI browser automationredis- Redis client with vector searchweave- W&B Weave observabilityopenai- OpenAI SDK@radix-ui/*- Headless UI componentsframer-motion- Animationsrecharts- Charts for dashboardlucide-react- Icons
Development
vitest- Unit testingtypescript- Type checkingeslint- Lintingprettier- Formattingtsx- TypeScript execution
Troubleshooting
Common Issues
Stagehand initialization fails
- Verify
BROWSERBASE_API_KEYandBROWSERBASE_PROJECT_ID - Check Browserbase dashboard for session limits
Redis connection errors
- For local: ensure Redis Stack is running (
redis-server) - For cloud: verify
REDIS_URLformat
TypeScript errors after patch
- Fixer Agent may generate type-incorrect code
- Type errors are allowed; syntax errors are blocked
- Check
pnpm lintoutput
Vercel deployment fails
- Verify
VERCEL_TOKENandVERCEL_PROJECT_ID - Check git working directory is clean
References
- QAgent Paper - Five-step agentic patching framework
- Stagehand Docs - AI-powered browser automation
- Browserbase Docs - Cloud browser infrastructure
- Redis Vector Search - Semantic similarity
- W&B Weave - LLM observability
- Marimo - Reactive Python notebooks
Last updated: March 2026