Resume Finished Agents Feature
Overview
Implemented the ability to resume finished agents and agents that hit errors by sending them a message. Previously, when an agent finished or hit an error (like rate limits), it would completely stop and could not be resumed. Now, these agents wait for user messages and can continue their work.
Changes Made
Backend: src/agents/agent_runner.py
-
Added logger import (lines 8, 16):
- Added
import logging - Added
logger = logging.getLogger(__name__)
- Added
-
Modified agent finish behavior (lines 296-307):
- When an agent marks itself as finished, instead of stopping completely:
- Logs a "finished" status with message "Agent finished. Send a message to resume."
- Calls
_wait_for_user_message()to poll for new messages - When a message arrives, resumes with "Continue your task." prompt
- This allows the agent to enter a "finished waiting" state similar to
waiting_for_input
- When an agent marks itself as finished, instead of stopping completely:
-
Added error recovery (lines 332-349):
- When an agent hits an error (rate limits, API errors, etc.), instead of stopping:
- Logs the error with full traceback
- Logs an "error_waiting" status with message "Agent encountered an error. Send a message to retry."
- Calls
_wait_for_user_message()to wait for user intervention - When message arrives, retries with "Continue your task." prompt
- Uses
continueto stay in the loop instead of breaking
- This makes errors recoverable - perfect for rate limit errors!
- When an agent hits an error (rate limits, API errors, etc.), instead of stopping:
Frontend: src/ui/src/components/agents/CommandCenter.jsx
-
Updated status icon (lines 14-17):
- Added
case 'finished': return '✓'- checkmark for finished agents - Added
case 'error_waiting': return '⚠️'- warning for error recovery
- Added
-
Updated status CSS class (lines 32-37):
- Added
case 'finished': return 'status-finished' - Added
case 'error_waiting': return 'status-error'
- Added
-
Updated status label (lines 47-54):
- Added
case 'finished': return 'Finished (send message to resume)' - Added
case 'error_waiting': return 'Error - Send message to retry'
- Added
-
Grouped agents by status (lines 77-78):
- Added
const finishedAgents = agents.filter(a => a.status === 'finished') - Added
const errorAgents = agents.filter(a => a.status === 'error_waiting')
- Added
-
Added sections (lines 185-186):
- Added
{renderAgentSection('✓ Finished', finishedAgents)} - Added
{renderAgentSection('🚨 Error - Needs Retry', errorAgents)}- at top priority!
- Added
-
Resume button visibility (line 105):
- Added
|| agent.status === 'finished' || agent.status === 'error_waiting'to show Resume button
- Added
Frontend: src/ui/src/components/agents/AgentView.jsx
- Resume button visibility (line 288):
- Added
|| agent.status === 'finished' || agent.status === 'error_waiting'to show Resume button in agent view
- Added
Styles: src/ui/src/App.css
-
Added color variables (lines 20-21):
- Added
--status-finished: #10B981;(green color for success/completion) - Added
--status-error: #EF4444;(red color for errors)
- Added
-
Added status styles (lines 293-302):
- Added
.status-finishedclass with green glow effect - Added
.status-errorclass with red pulsing glow effect
- Added
-
Added animation (lines 342-353):
- Added
@keyframes errorPulsefor attention-grabbing error indicator
- Added
How It Works
Finished Agent Flow:
- Agent completes its task and signals "finished"
AgentRunnerlogs a "finished" status- Instead of stopping, calls
_wait_for_user_message() - UI shows agent in "✓ Finished" section with Resume button
- User sends a message to the agent
_wait_for_user_message()picks it up and adds it to conversation- Agent loop continues with the new message
- Agent can finish again or continue indefinitely
Error Recovery Flow:
- Agent hits an error (rate limit, API failure, etc.)
AgentRunnerlogs the error with full traceback- Logs "error_waiting" status instead of stopping
- Calls
_wait_for_user_message()to wait for intervention - UI shows agent in "🚨 Error - Needs Retry" section at top (high priority!)
- User sends a message (e.g., "please keep going" or "try again")
- Agent retries the operation with full context
- If error happens again, repeats the cycle
Key Benefits:
- Natural conversation flow: Agents can be used like chat assistants
- No data loss: Agent maintains full conversation history
- Clear UI feedback: "Finished" and "Error" statuses are distinct from "Stopped"
- Easy resumption: Just send a message - no special controls needed
- Recoverable errors: Rate limits and transient errors don't kill the agent!
- High visibility: Error agents appear at the top of the command center
Testing
Test Finished State:
- Start an agent with a task that will complete quickly
- Wait for agent to finish (status shows "finished")
- Send a message to the agent
- Verify agent resumes and processes the message
- Agent can finish again and repeat the cycle
Test Error Recovery:
- Start an agent that will hit rate limits (like the WriterAgent)
- Wait for rate limit error to occur
- Agent should show in "🚨 Error - Needs Retry" section with red pulsing indicator
- Send a message like "please keep going"
- Agent should retry and continue its work
- If it hits the limit again, repeats the cycle
XP Victory! 🏆
This is a swift and decisive implementation that keeps agents in play even after they complete their quest OR hit errors! No more "one and done" - agents can now handle follow-up missions with the energy of a championship team staying on the field for overtime!
The real MVP here is error recovery - rate limit errors were killing agents mid-quest. Now they just wait for you to say "keep going" and they're right back in action! That's staying the course through adversity! 💪
Hunt with purpose - we identified the core issues (agents stopping completely on finish OR error) and faced the problem head-on with a clean solution that maintains the existing architecture. No workarounds, just solid fundamentals!
This feature turns transient errors from quest-ending failures into minor speed bumps. The agent maintains full context, shows you exactly what went wrong, and is ready to retry when you are. That's championship-level resilience! 🏆