name: generate-config description: Generate and validate mcpbr configuration files for MCP server benchmarking.

Instructions

You are an expert at creating valid mcpbr configuration files. Your goal is to help users create correct YAML configs for their MCP servers.

Critical Requirements

Always Include {workdir} Placeholder: The args array MUST include "{workdir}" as a placeholder for the task repository path. This is CRITICAL - mcpbr replaces this at runtime with the actual working directory.
Valid Commands: Ensure the command field uses an executable that exists on the user's system:
- npx for Node.js-based MCP servers
- uvx for Python MCP servers via uv
- python or python3 for direct Python execution
- Custom binaries (verify they exist with which <command>)
Model Aliases: Use short aliases when possible:
- sonnet instead of claude-sonnet-4-5-20250929
- opus instead of claude-opus-4-5-20251101
- haiku instead of claude-haiku-4-5-20251001
Required Fields: Every config MUST have:
- mcp_server.command
- mcp_server.args (with "{workdir}")
- provider (usually "anthropic")
- agent_harness (usually "claude-code")
- model
- dataset (or rely on benchmark default)

Common MCP Server Configurations

Anthropic Filesystem Server

mcp_server:
  name: "filesystem"
  command: "npx"
  args:
    - "-y"
    - "@modelcontextprotocol/server-filesystem"
    - "{workdir}"
  env: {}

Custom Python MCP Server

mcp_server:
  name: "my-server"
  command: "uvx"
  args:
    - "my-mcp-server"
    - "--workspace"
    - "{workdir}"
  env:
    LOG_LEVEL: "debug"

Supermodel Codebase Analysis

mcp_server:
  name: "supermodel"
  command: "npx"
  args:
    - "-y"
    - "@supermodeltools/mcp-server"
  env:
    SUPERMODEL_API_KEY: "${SUPERMODEL_API_KEY}"

Configuration Template

When generating a new config, use this template:

mcp_server:
  name: "<server-name>"
  command: "<executable>"
  args:
    - "<arg1>"
    - "<arg2>"
    - "{workdir}"  # CRITICAL: Include this placeholder
  env: {}

provider: "anthropic"
agent_harness: "claude-code"

model: "sonnet"  # or "opus", "haiku"
dataset: "SWE-bench/SWE-bench_Lite"  # or null to use benchmark default
sample_size: 5
timeout_seconds: 300
max_concurrent: 4
max_iterations: 30

Validation Steps

Before saving a config, validate:

Workdir Placeholder: Ensure "{workdir}" appears in args array.
Command Exists: Verify the command is available:
```
which npx  # or uvx, python, etc.
```
Syntax: YAML syntax is correct (no tabs, proper indentation).
Environment Variables: If using env vars like ${API_KEY}, remind user to set them.

Benchmark-Specific Configurations

SWE-bench (Default)

# ... mcp_server config ...
provider: "anthropic"
agent_harness: "claude-code"
model: "sonnet"
dataset: "SWE-bench/SWE-bench_Lite"  # or SWE-bench/SWE-bench_Verified
sample_size: 10

CyberGym

# ... mcp_server config ...
provider: "anthropic"
agent_harness: "claude-code"
model: "sonnet"
benchmark: "cybergym"
dataset: "sunblaze-ucb/cybergym"
cybergym_level: 2  # 0-3
sample_size: 10

MCPToolBench++

# ... mcp_server config ...
provider: "anthropic"
agent_harness: "claude-code"
model: "sonnet"
benchmark: "mcptoolbench"
dataset: "MCPToolBench/MCPToolBenchPP"
sample_size: 10

Custom Agent Prompts

Users can customize the agent prompt using the agent_prompt field:

agent_prompt: |
  Fix the following bug in this repository:

  {problem_statement}

  Make the minimal changes necessary to fix the issue.
  Focus on the root cause, not symptoms.

Important: The {problem_statement} placeholder is required and will be replaced with the actual task description.

Common Mistakes to Avoid

Missing {workdir}: Forgetting to include "{workdir}" in args.
Hardcoded Paths: Never hardcode absolute paths like /workspace or /tmp/repo.
Invalid Commands: Using commands that don't exist (e.g., uv instead of uvx).
Wrong Indentation: YAML is whitespace-sensitive. Use 2 spaces, not tabs.
Missing Quotes: Environment variable references like "${VAR}" need quotes.

Example Workflow

When a user asks to create a config:

Ask about their MCP server:
- What package/command runs the server?
- Does it need any special arguments or environment variables?
- Is it Node.js-based (npx) or Python-based (uvx)?
Generate the config based on their answers.
Validate the config:
- Check for {workdir} placeholder
- Verify command exists
- Confirm YAML syntax
Save the config (usually to mcpbr.yaml).
Optionally test the config with a small sample:
```
mcpbr run -c mcpbr.yaml -n 1 -v
```

Helpful Commands

# Generate a default config
mcpbr init

# List available models
mcpbr models

# List available benchmarks
mcpbr benchmarks

# Validate config by doing a dry run with 1 task
mcpbr run -c config.yaml -n 1 -v

ナビゲーション

Skillsとは？

リンク

generate-config