Managed Agents Deep Audit (phase-4.11.1)
URL coverage
All 9 canonical Managed Agents pages fetched in full via WebFetch. No
linked /v1/agents, /v1/sessions, /v1/environments reference pages
were available as standalone targets beyond what is inlined in the
guides; every endpoint surface used in the guides is documented below.
- overview, quickstart, agent-setup, sessions, skills, tools, memory, files, vaults — all retrieved 2026-04-18.
- Features beta-gated (
managed-agents-2026-04-01header on every request). Memory, outcomes, multi-agent are Research Preview and require a separate access form.
Per-page digests
overview
Managed Agents is explicitly framed as the opposite product surface
to the Messages API. Messages = "direct model prompting, custom agent
loops"; Managed Agents = "pre-built, configurable agent harness that
runs in managed infrastructure". Core objects: agent (persona +
tools + skills, versioned), environment (cloud container template),
session (running instance), events (SSE). Runs Claude 4.5+ only.
Rate limits: 60/min create, 600/min read per org, plus tier spend
limits.
quickstart
Installs ant CLI + SDKs (Python/TS/Go/Java/C#/Ruby/PHP). Flow:
POST /v1/agents → POST /v1/environments → POST /v1/sessions →
POST /v1/sessions/{id}/events + SSE stream at
/v1/sessions/{id}/stream. agent_toolset_20260401 enables the
full built-in toolset. Session stays idle until a user event; agent
autonomously tool-calls until it emits session.status_idle.
agent-setup
Agents are versioned resources. Fields: name, model, system,
tools, mcp_servers, skills, callable_agents (multi-agent, RP),
description, metadata. Updates generate new versions with
optimistic concurrency (version argument). Lifecycle: update → new
version; list versions; archive (read-only; existing sessions keep
running). Agents can be pinned per-session by passing
{type:"agent", id, version}.
sessions
A session requires agent + environment_id. Statuses:
idle | running | rescheduling | terminated. Sessions are
stateful — history persisted server-side, container mounted,
retrievable and listable. Event delivery: POST user events, open
SSE stream. Archive preserves history and blocks new events; delete
tears down container + events. Files, memory stores, environments,
and agents are independent and survive session deletion. Supports
vault_ids[] and resources[] (files, memory_stores, GitHub repos).
skills
Same SKILL.md model as Claude Code: filesystem-based, progressive
disclosure, attached to the agent. Two flavors: anthropic pre-built
(e.g., xlsx, pptx, docx, pdf) and custom org-authored with
versioning (latest or pinned). Cap: 20 skills per session. Skills
are invoked automatically when relevant; they do not consume context
until needed.
tools
Built-in toolset (agent_toolset_20260401): bash, read, write,
edit, glob, grep, web_fetch, web_search — a 1:1 subset of
Claude Code's harness. Per-tool enable/disable via configs[];
default_config.enabled:false for whitelist mode. Custom tools are
client-executed (Messages-API-equivalent tool-use contract) and MCP
servers attach at agent level.
memory
Research Preview. Memory stores (memstore_...) are workspace-
scoped collections of ≤100KB text "memories" mounted per session via
resources[].memory_store. Up to 8 stores/session, read_only or
read_write. Agent gets memory_{list,search,read,write,edit,delete}
tools automatically. Every mutation creates an immutable memver_...
with full audit trail, optimistic concurrency via
content_sha256/not_exists preconditions, and a redact endpoint
for PII/secret scrubbing that keeps the audit record but nukes the
content. This is the first-class replacement for our BM25-over-BQ
long-term memory.
files
Upload via Files API → mount at resources[].file with arbitrary
mount_path (read-only inside container, absolute paths). Up to 100
files/session. Files are resources independent of session lifecycle.
Session-scoped listing via files.list(scope_id=sesn_...) lets you
retrieve artifacts the agent produced. Copies into session don't
count against storage limits.
vaults
Per-end-user credential primitive. Workspace-scoped. Holds up to 20
credential objects, each bound immutably to a single
mcp_server_url. Two auth types: mcp_oauth (Anthropic handles
refresh when you register refresh.token_endpoint + client auth
style) and static_bearer. Secret fields write-only, never returned.
vault_ids[] passed at session creation; mid-session rotation
propagates without restart. Only useful for MCP-server auth — not
a general secret manager (cannot inject arbitrary env vars into the
container, cannot hold non-MCP keys).
pyfinAgent fit analysis
1. Is it a different product surface?
Yes. Managed Agents is a fully server-hosted, stateful container
harness — Anthropic runs the agent loop, the sandbox, the tool
execution, and persists event history. The Messages API we rely on
(llm_client.py, all 28 Gemini agents via Vertex, our MAS
orchestrator) is not replaced — Managed Agents only hosts Claude
models (4.5+) and does not support Gemini, so Layer 1 stays on
Messages/Vertex regardless.
2. Would Layer-2 MAS or the harness cycle benefit from migration?
Layer-2 MAS (multi_agent_orchestrator.py): mixed. Managed Agents
would give us free sandboxed bash/file tools, SSE streaming, and
server-side conversation state — but we already run these agents in
our own FastAPI process and need tight integration with BQ, paper
trader, ticket queue. Migration cost is high for modest gain.
Harness cycle (scripts/harness/run_harness.py,
autonomous_harness.py): potentially high-value. The harness is
long-running, tool-heavy, already follows Plan→Generate→Evaluate with
Claude Opus. Managed Agents natively supports: durable sessions,
resume semantics, event log = handoff/-equivalent, SSE streaming to
the frontend Harness tab, vault for MCP auth, memory stores for cross-
cycle learnings (replacing our pyfinagent_data.harness_learning_log
BQ table). The dual-evaluator pattern maps cleanly onto
callable_agents (multi-agent RP).
3. Cost / retention / residency. No public pricing table on these pages. The container compute is billed in addition to model inference. Rate-limited 60 create / 600 read per min per org. Data residency not discussed — assume US-only until Anthropic documents otherwise; a blocker for any EU-residency-sensitive data (our GCP billing export is EU; none of our prod data is). Session archive preserves history indefinitely; delete is hard. Memory versions accumulate forever until explicitly deleted or redacted.
4. Vaults vs. GCP Secret Manager / env vars. Vaults are narrowly scoped to MCP-server auth. They do not replace Secret Manager for GCP service accounts, Slack signing secrets, NextAuth keys, Anthropic/Gemini API keys, etc. If we ever add user-authorized MCP servers (e.g., per-user Slack, Linear, GitHub OAuth for the Slack bot), vaults would be the right tool and would eliminate us writing a per-user OAuth token store. For our current single-tenant admin-only app, no immediate relevance.
5. Does it solve our file-based handoff problem?
Partially, and worth serious thought for phase-4.11+. Our
handoff/current/{contract,experiment_results,evaluator_critique}.md
harness_log.mdis essentially a hand-rolled implementation of what Managed Agents gives natively as: session event history + memory store + sessionresources[]. The five-file protocol is load- bearing precisely because the Messages API has no server-side session. If we move the harness loop onto Managed Agents, three of the five files become server-side primitives; we'd keepcontract.md(human-readable plan) andharness_log.md(cross-cycle summary, which maps to a memory store). But the protocol's real value is Anthropic's "harness design" discipline (immutable success criteria, dual evaluator, research gate) — which is orthogonal to where state lives. Moving to Managed Agents would not remove the discipline, only the file plumbing.
MUST FIX
None. This is greenfield. Our current harness is conformant with the Anthropic harness-design doctrine; it just uses a different storage substrate.
NICE TO HAVE / adoption evaluation
Ranked by ROI:
-
Pilot the harness cycle on Managed Agents (phase-4.12 candidate). Single-agent first: port
run_harness.pyGENERATE phase to a Managed Agent session withagent_toolset_20260401, attach a memory store in place ofharness_learning_log, keep qa-evaluator and harness-verifier as local subagents untilcallable_agentsleaves Research Preview. Expected win: kill zombie-worker problems, free SSE stream for the Harness tab, and get audited memory versioning for free. Request memory + multi- agent RP access via the form linked in the overview page. -
Adopt Anthropic pre-built skills (xlsx/pptx/docx/pdf) for the Slack bot and investor-report flow. Replaces any hand-rolled openpyxl/python-pptx paths. Zero migration cost — just attach them to the agent config.
-
Defer vaults until we add user-facing MCP integrations. Current single-tenant model doesn't need them; GCP Secret Manager continues to cover service-account secrets.
-
Do NOT migrate Layer-1 or Layer-2 yet. Layer 1 is Gemini-bound; Managed Agents is Claude-only. Layer 2 has too much local orchestration (paper trader, ticket queue) to justify the container round-trip cost per turn.
-
Stress-test doctrine check. Per CLAUDE.md, "every harness component encodes an assumption about what the model can't do" — Managed Agents is Anthropic's own answer to the same question, so it is worth re-running a representative harness step via a Managed Agent session (no local five-file plumbing) and comparing the output quality/cost to our current run. That experiment is a direct test of whether our scaffolding is still load-bearing.
References
- https://platform.claude.com/docs/en/managed-agents/overview
- https://platform.claude.com/docs/en/managed-agents/quickstart
- https://platform.claude.com/docs/en/managed-agents/agent-setup
- https://platform.claude.com/docs/en/managed-agents/sessions
- https://platform.claude.com/docs/en/managed-agents/skills
- https://platform.claude.com/docs/en/managed-agents/tools
- https://platform.claude.com/docs/en/managed-agents/memory
- https://platform.claude.com/docs/en/managed-agents/files
- https://platform.claude.com/docs/en/managed-agents/vaults
- https://www.anthropic.com/engineering/harness-design-long-running-apps (project canonical harness doctrine)
/Users/ford/.openclaw/workspace/pyfinagent/CLAUDE.md— harness protocol, research-gate, stress-test doctrine/Users/ford/.openclaw/workspace/pyfinagent/scripts/harness/run_harness.py/Users/ford/.openclaw/workspace/pyfinagent/backend/autonomous_harness.py/Users/ford/.openclaw/workspace/pyfinagent/backend/agents/multi_agent_orchestrator.py