AlphaEvolve Open-Source Implementation: Coding Agent Blueprint
Overview
This document serves as the architectural blueprint and execution plan for building an open-source, asynchronous evolutionary code-generation pipeline inspired by DeepMind's AlphaEvolve. The system prioritizes local execution, high throughput, and secure isolation, making it ideal for researching optimization techniques in complex neural architectures, physical simulations, and low-level system operations.
Architectural Pillars
1. Core Architecture & Data Structures (Python)
The foundation of the pipeline relies on modern, asynchronous Python (asyncio) to maintain high throughput and non-blocking execution.
- Initialization: Set up a modular project structure with strict type hinting and a robust logging framework to monitor throughput and error rates.
- Data Models:
Program: A dataclass storing the source code string, historical evaluation score(s), execution logs, and a unique identifier.Diff: A dataclass strictly modeling the<<<<<<< SEARCHand>>>>>>> REPLACEmutation format.
2. The LLM Orchestrator (Ollama Integration)
The generation engine utilizes a local open-weight model (gemma4:26b-a4b-it-q4_K_M) orchestrated via Ollama.
- Asynchronous Client: Implement a non-blocking API client (using
aiohttporhttpx) pointing to the local Ollama instance (default port 11434). - PromptBuilder Module: Construct dynamic prompts injecting system instructions, historical high-performing code (context), and the targeted code block.
- Parser & Retry Logic: Build resilient parsing to extract the specific SEARCH/REPLACE diff format, with automatic retries for malformed JSON or invalid diffs.
- Design Note: Keep the interface abstract. While Ollama handles the current workload, the architecture must allow seamless swapping to custom PyTorch or Hugging Face inference loops for future parameter tuning or sequence modeling experiments.
3. The Evaluator Sandbox (gVisor & Docker)
Executing untrusted, AI-generated code requires an impenetrable, ephemeral sandbox to prevent system instability.
- Docker SDK Integration: Programmatically manage container lifecycles using the
dockerPython SDK. - Isolation Engine: Utilize the
runsc(gVisor) runtime for robust kernel-level system call interception without the overhead of a full VM. - Environment Constraints:
network_mode='none'to ensure a strictly air-gapped environment.read_only=Truefor the root filesystem.- Mount a highly restricted
tmpfsvolume (e.g., 5MB at/tmp/eval) exclusively for capturing standard output and evaluation metrics.
- Resource Quotas: Enforce strict cgroup limits (
mem_limit,cpu_quota) to immediately kill fork bombs or memory leaks. - Timeouts: Wrap the execution command in an
asyncio.wait_forhard wall-clock timeout (e.g., 10 seconds).
4. The MAP-Elites Database
The evolutionary memory of the system, balancing exploitation of high scores with exploration of novel logic.
- Structure: Implement a
ProgramDatabaseutilizing a multi-dimensional archive inspired by MAP-elites. Programs are binned not just by score, but by behavioral characteristics (e.g., AST depth, execution time, reliance on specific libraries). - Sampling Strategy: The
sample()method must mathematically favor a mix of elite performers and diverse, under-explored approaches to avoid collapsing into local optima.
5. The Asynchronous Controller Loop
The central nervous system linking all components into a concurrent pipeline.
- Lifecycle Management: Implement the main
asyncloop:Sample -> Prompt -> Generate -> Apply Diff -> Evaluate -> Update Database. - Concurrency Control: Deploy
asyncio.Semaphoreor queues to throttle simultaneous requests, preventing the Ollama instance or Docker daemon from being overwhelmed by the RTX 4060 Ti's generation speed.
6. The "Hello World" Benchmark (Knapsack Heuristic)
Validate the end-to-end pipeline using a deterministic, lightweight optimization problem.
- Initial Seed: Inject the Fractional Knapsack heuristic script as the initial
Programseed. - Evaluation Capture: Configure the sandbox to capture stdout, parsing the JSON string outputted by the
evaluate()function to extract the score float. - Success Criteria: Run the pipeline until the
ProgramDatabaseregisters a program with the known global optimum score of 275.0, confirming that generation, mutation, execution, and selection are all functioning correctly.