name: profile-engine description: Profile the Python engine to identify performance bottlenecks in NumPy/Numba code disable-model-invocation: true
Profile the OSMOSE Python engine to identify performance bottlenecks and Numba JIT compilation issues.
Arguments
config(optional): "bob" (Bay of Biscay) or "eec" (EEC) — default: "bob"years(optional): simulation years (default: 5)focus(optional): specific module to deep-profile (e.g., "predation", "movement")
Steps
-
Run cProfile to get a high-level function-time breakdown:
.venv/bin/python -m cProfile -s cumulative -m osmose.engine.cli --config {config} --years {years} 2>&1 | head -40If no CLI entry point, use this inline approach:
.venv/bin/python -c " import cProfile from scripts.benchmark_engine import run_benchmark cProfile.run('run_benchmark()', sort='cumulative') " 2>&1 | head -40 -
Identify top 5 hotspots: From cProfile output, list the functions consuming the most cumulative time. Focus on
osmose/engine/functions, not NumPy/Numba internals. -
Check Numba JIT status for hotspot functions:
.venv/bin/python -c " import numba numba.config.DEVELOPER_MODE = True # Import the module to trigger JIT compilation from osmose.engine.processes import predation print('Numba JIT compilation successful') "Look for:
- Functions falling back to object mode (kills performance)
- Type inference failures
- Unsupported Python features inside
@njit
-
Check for non-vectorized loops: Search hotspot files for Python loops over arrays that should be vectorized:
grep -n "for.*in range" osmose/engine/processes/{focus}.pyFlag any loop iterating over array elements that could use NumPy broadcasting.
-
Check memory allocation patterns: Look for array allocations inside hot loops:
grep -n "np.zeros\|np.empty\|np.array" osmose/engine/processes/{focus}.pyArrays should be pre-allocated outside loops where possible.
-
Compare against baseline timing:
- Bay of Biscay 5yr baseline: ~2.0s
- EEC 5yr baseline: ~5.2s
If current timing exceeds baseline by >10%, flag as regression.
-
Report findings as a table:
Rank Function Time (s) % Total Issue 1 predation._compute_kernel X.XX XX% — 2 movement._distribute X.XX XX% Non-vectorized loop -
Suggest optimizations for any identified bottlenecks:
- Python loop → NumPy vectorization
- NumPy operation → Numba
@njit - Repeated allocation → pre-allocated buffer
- Object-mode Numba → fix type annotations
Rules
- Always use
.venv/bin/python, never system python - Run from
/home/razinka/osmose/osmose-python/ - First JIT compilation is slow — run twice and report the second (warm) timing
- Do NOT modify engine code during profiling — this is a read-only analysis
- Always verify parity after any suggested optimization is applied
- Performance baselines: BoB 5yr ~2.0s, EEC 5yr ~5.2s (Python), Java BoB ~2.3s, EEC ~7.2s