name: browser-agent-server description: Deploy a full-stack multi-model Browser Agent system with FastAPI server, real-time dashboard, VNC streaming, and LLM Council mode. Use when the user asks to set up browser automation, build a browser agent, deploy an AI web agent, create a browser-use server, or needs multi-model browser automation with strategies like council, consensus, fallback chain, or planner-executor.
Browser Agent Server
Full-stack AI browser automation system: FastAPI backend + real-time dashboard + Xvfb/VNC display + multi-model LLM strategies.
Architecture
agent_server.py (FastAPI + WebSocket + browser-use 0.11.9)
dashboard.html (single-file SPA: dark UI, live logs, VNC embed, model picker)
start.sh (startup script with prerequisite checks)
5 Strategies: single, fallback_chain, planner_executor, consensus (per-step judge), council (multi-model failure recovery with loop detection)
Display: Xvfb :98 (configurable) -> x11vnc :5999 -> noVNC/websockify :6080
Live Screen: The dashboard uses WebSocket-streamed screenshots (0.5s intervals) as the primary embedded view. VNC is available via pop-out for interactive control. The server manages its own Xvfb, x11vnc, and noVNC processes automatically on startup.
Setup
1. Install system dependencies
sudo apt-get update && sudo apt-get install -y xvfb x11vnc x11-apps imagemagick novnc
2. Create Python venv and install packages
python3 -m venv /home/node/browser-agent-venv
/home/node/browser-agent-venv/bin/pip install browser-use==0.11.9 fastapi uvicorn[standard] websockets websockify
/home/node/browser-agent-venv/bin/python3 -m playwright install chromium
IMPORTANT: websockify must be installed in the venv (or available system-wide). The server auto-detects it from the venv's bin/ directory first, then falls back to system PATH.
2b. CRITICAL: Install Chromium shared library dependencies
Without this step, Chromium will fail with libatk-1.0.so.0: cannot open shared object file or similar errors, causing a 30-second timeout on browser launch.
/home/node/browser-agent-venv/bin/python3 -m playwright install-deps chromium
This installs ~40 system libraries (libatk, libasound, libxkbcommon, fonts, etc.) that Chromium requires at runtime. This is separate from playwright install chromium which only downloads the browser binary.
2c. Fix broken venv symlinks (if needed)
If the venv's python3 symlink is broken (e.g., after system upgrades), fix it:
ln -sf /usr/bin/python3 /home/node/browser-agent-venv/bin/python3
3. Deploy application files
Copy bundled scripts to a project directory:
DEST="./outputs/browser-agent"
mkdir -p "$DEST"
cp ~/.claude/skills/happycapy-browser-agent/scripts/agent_server.py "$DEST/"
cp ~/.claude/skills/happycapy-browser-agent/scripts/dashboard.html "$DEST/"
cp ~/.claude/skills/happycapy-browser-agent/scripts/start.sh "$DEST/"
chmod +x "$DEST/start.sh"
4. Configure environment
# Required: LLM API key (OpenAI-compatible gateway)
export AI_GATEWAY_API_KEY="your-key"
# Optional: custom port (default 8888)
export AGENT_PORT=8888
# Optional: display number (default 98, avoids conflict with system Xvfb on :99)
export DISPLAY_NUM=98
# Optional: virtual display resolution (default 1280x1024)
export SCREEN_WIDTH=1280
export SCREEN_HEIGHT=1024
# REQUIRED for sandbox environments: set the public noVNC URL for dashboard VNC pop-out
# Replace with the actual exported URL from step 6
export NOVNC_PUBLIC_URL="https://YOUR-NOVNC-URL/vnc.html?host=YOUR-HOST&port=443&encrypt=1&autoconnect=true&resize=scale&scaleViewport=true"
5. Start
cd "$DEST"
/home/node/browser-agent-venv/bin/python3 agent_server.py
The server automatically starts Xvfb, x11vnc, and noVNC. If an Xvfb is already running on the target display, it reuses it instead of failing.
6. Export ports (sandbox environments)
/app/export-port.sh $AGENT_PORT # Dashboard (default 8888)
/app/export-port.sh 6080 # noVNC (for VNC pop-out, set NOVNC_PUBLIC_URL with exported URL)
Note: Port 3001 is reserved. Do not use it. If port 8888 is already in use, set AGENT_PORT to another value (e.g., 9222).
API Reference
| Endpoint | Method | Description |
|---|---|---|
/ | GET | Dashboard HTML |
/api/models | GET | Available models + strategies |
/api/agent/start | POST | Start task (JSON body below) |
/api/agent/stop | POST | Stop running task |
/api/agent/status | GET | Current status, action_log, result |
/ws | WebSocket | Real-time updates (step, status, screenshot, judge_verdict, council_verdict) |
Start task body
{
"task": "Go to google.com and search for AI",
"max_steps": 50,
"model_config_data": {
"strategy": "council",
"primary_model": "openai/gpt-4o",
"secondary_model": "",
"council_members": ["moonshotai/kimi-k2.5", "google/gemini-2.5-flash", "google/gemini-2.5-pro"]
}
}
WebSocket start (from dashboard)
{
"type": "start_task",
"task": "...",
"max_steps": 50,
"model_config": { "strategy": "council", "primary_model": "openai/gpt-4o", "council_members": [...] }
}
Strategy Details
| Strategy | How it works | When to use |
|---|---|---|
single | One model, all steps | Simple tasks, cost-sensitive |
fallback_chain | Primary runs; switches to secondary on error/rate-limit | Reliability |
planner_executor | Strong model plans first; fast model executes | Complex multi-step |
consensus | Primary acts; judge model validates every step in real-time | Quality-critical |
council | Primary runs; on repeated failure/loop/stall, ALL council models convene to diagnose, advise, and replan | Hard tasks, anti-stall |
Council Mode Details
- Failure trigger:
consecutive_failures >= 2 - Loop trigger (3-tier): Strict fingerprint match (3 repeats), loose action-type match (4 repeats), same-URL stall with no progress (5 repeats)
- Stall trigger: Single step running > 60 seconds
- Feedback injection: Council verdict injected via
ActionResult.long_term_memory(agent sees it next step) - Replan: Council can replace
agent.state.planwith revised steps - Cooldown: 3 steps between loop-triggered councils to prevent meta-loops
Available Models (AI Gateway)
Configure in AVAILABLE_MODELS list in agent_server.py:
AVAILABLE_MODELS = [
{"id": "openai/gpt-4o", "name": "GPT-4o", "tier": "fast", "vision": True},
{"id": "moonshotai/kimi-k2.5", "name": "Kimi K2.5", "tier": "fast", "vision": True},
{"id": "google/gemini-2.5-flash", "name": "Gemini 2.5 Flash", "tier": "fast", "vision": True},
{"id": "google/gemini-2.5-pro", "name": "Gemini 2.5 Pro", "tier": "reasoning", "vision": True},
]
To add models: add to this list and they appear in dashboard dropdown + available as council members.
Troubleshooting
Browser launch timeout (BrowserStartEvent timed out after 30.0s)
Chromium is missing shared libraries. Fix:
/home/node/browser-agent-venv/bin/python3 -m playwright install-deps chromium
This installs libatk, libasound, libxkbcommon, fonts, etc. Must run after playwright install chromium.
Verify Chromium works
DISPLAY=:98 /home/node/.cache/ms-playwright/chromium-*/chrome-linux64/chrome --version
If it prints a version, it's working. If it errors with cannot open shared object file, run install-deps above.
Xvfb lock file error (Server is already active for display :98)
The server now auto-detects and reuses existing Xvfb processes. If you still get lock file errors:
rm -f /tmp/.X98-lock
To use a different display number:
export DISPLAY_NUM=97 # or any unused display number
websockify not found
The server auto-detects websockify from the Python venv first, then falls back to system PATH. Ensure it's installed:
/home/node/browser-agent-venv/bin/pip install websockify
Broken Python venv symlinks
If python3 in the venv is a broken symlink:
ln -sf /usr/bin/python3 /home/node/browser-agent-venv/bin/python3
Port already in use
export AGENT_PORT=9222 # or any free port (avoid 3001 - reserved)
noVNC not loading
Ensure novnc system package is installed (/usr/share/novnc/ must exist):
sudo apt-get install -y novnc
Dashboard screen not showing / tiny / wrong size
The dashboard uses WebSocket-streamed screenshots as the primary live view. If the screen appears wrong:
- Hard-refresh the browser (Ctrl+Shift+R) to clear cached CSS
- Increase Xvfb resolution:
export SCREEN_WIDTH=1280 SCREEN_HEIGHT=1024 - The default display
:98avoids conflicts with system-managed Xvfb on:99
Key Implementation Notes
browser-useChatOpenAI returnsChatInvokeCompletionwith.completionfield (NOT.content)agent.state.planis mutable fromon_step_endhook -- changes affect next stepActionResult.long_term_memorygets injected into next step's context via MessageManageragent.state.consecutive_failurestracks errors; reset on success- The
on_step_endhook signature:AgentHookFunc = Callable[['Agent'], Awaitable[None]] - Dashboard is a single HTML file with inline CSS/JS (no build step)
- noVNC served from system install at
/usr/share/novnc/ - Dashboard live screen uses
object-fit: fillwith absolute positioning for full panel coverage - Body uses flexbox layout (
display: flex; flex-direction: column) to prevent viewport overflow - CSS Grid cells use
min-width: 0to prevent grid blowout from oversized content - Screenshots are streamed at native Xvfb resolution (no server-side resize) for best quality
- The server reuses existing Xvfb if one is already running on the target display