AGENTS.md — Instructions for the Coding Agent
This file is the primary instruction set for any AI coding agent working on this repository. Read it fully before writing any code. It defines what to build, how to build it, what not to do, and how to verify your work.
1. Project Mission
You are implementing the Oracle Gap research software system. The goal is to compute the theoretical upper bound of achievable returns (the "oracle ceiling") in binary prediction markets, and measure how close real participants come to it, across a parametric investor spectrum.
Phase 0 scope (your current task): Build the complete end-to-end pipeline for Bitcoin 5-minute direction markets on Polymarket (event class C1). Nothing else. The architecture must be general (class-agnostic), but only the BTC C1 path needs to be wired up and tested.
2. The Single Most Important Rule
The oracle formula is the ground truth. Every line of code that touches it must have a passing unit test before it is considered done.
The formula is in docs/MATH_MODEL.md. If you implement it incorrectly, all downstream results are wrong and the paper cannot be published. Test it obsessively. See tests/test_oracle.py for required test cases.
3. What To Build (Phase 0 Checklist)
Work through these in order. Do not skip ahead. Each item has an exit criterion.
3.1 Infrastructure
-
docker-compose.yml— services: timescaledb, redis, airflow, dashboard -
db/schema.sql— TimescaleDB hypertables + PostgreSQL tables (seeSCHEMA.md) -
db/migrations/— Alembic setup with initial migration applyingschema.sql -
.env.example— all required env vars documented -
Makefile— all targets listed inREADME.md -
requirements.txt— pinned versions matching the stack table inREADME.md
Exit criterion: make dev starts all services healthy. make schema applies migration successfully.
3.2 Ingestion Layer
-
ingestion/base.py—BaseIngestorabstract class with the three required methods -
ingestion/polymarket.py— implementsBaseIngestorfor Polymarket CLOB -
ingestion/binance.py— implementsBaseIngestorinterface pattern for BTC/USDT 1s OHLCV
Exit criterion: make ingest DATE=<any recent date> populates trades and reference_prices tables with no errors. At least 200 slot rows result from a single day.
3.3 Slot Constructor
-
pipeline/stages/p3_slots.py— constructsslot_masterfrom raw trades
Exit criterion: All 26 columns in slot_master are populated. data_quality_flag=0 rows have no nulls in core fields. BTC return sign matches outcome_yes for all flag=0 rows.
3.4 Oracle Engine
-
oracle/bounds.py— vectorised Polars oracle computation -
oracle/breakeven.py— SciPy break-even solver -
tests/test_oracle.py— all formula unit tests (see Section 6 of this file)
Exit criterion: All oracle unit tests pass (100%). oracle_bounds table populated for all slots. Runtime < 5s for 10,000 slots.
3.5 Monte Carlo
-
montecarlo/spectrum.py— Numba JIT investor spectrum
Exit criterion: For p=1.0 (oracle): E[R_net] > 0. For p=0.5 (random): E[R_net] < 0. Ruin probability monotonically decreasing in p. Runtime < 60 min for 10k runs on available hardware.
3.6 Statistical Analysis
-
stats/regression.py— IER OLS regression
Exit criterion: Regression runs without error. Outputs LaTeX table stub to paper/tables/.
3.7 Dashboard
-
viz/dashboard/app.py— Plotly Dash live dashboard
Exit criterion: Dashboard loads at :8050. IER time-series renders. Break-even p* displays in plain language.
3.8 Airflow DAG
-
pipeline/dags/dag_btc_pilot.py— full P0–P9 DAG
Exit criterion: Airflow UI shows green DAG for a 7-day backfill. All stages complete without error.
3.9 Reproducibility
-
make reproducetarget working end-to-end
Exit criterion: make reproduce succeeds in a clean container from only raw API data.
4. Architecture Rules (Non-Negotiable)
4.1 Class-Agnostic Code
Every module in oracle/, montecarlo/, stats/, and viz/ must work on the slot_master schema regardless of which class or platform produced the data. No class-specific logic outside ingestion/ and config/.
# WRONG — class-specific logic in oracle
if class_code == "C1":
fee = 0.02
# CORRECT — read fee from the slot record
fee = slot["fee_rate"]
4.2 Ingestor Interface
Every ingestor MUST implement BaseIngestor. Do not write a standalone script that directly writes to the DB. The interface is:
# ingestion/base.py
class BaseIngestor(ABC):
@abstractmethod
def fetch_trades(self, market_id: str, date: date) -> pl.DataFrame:
"""Returns a DataFrame matching the raw_trades schema."""
@abstractmethod
def fetch_resolution(self, market_id: str) -> Resolution:
"""Returns the resolution record for a market."""
@abstractmethod
def get_market_ids(self, class_code: str, date: date) -> list[str]:
"""Returns all active market IDs for a class on a given date."""
4.3 No Raw SQL in Business Logic
All database access goes through db/queries.py. No psycopg2.execute() calls in pipeline stage files. No inline SQL strings in oracle/, stats/, or viz/.
4.4 Polars, Not Pandas
Use Polars everywhere for DataFrame operations. Pandas is not in requirements.txt. If you find yourself writing pd.DataFrame, stop and rewrite in Polars.
4.5 Idempotency
Every pipeline stage must be safe to re-run. All DB writes must use INSERT ... ON CONFLICT DO UPDATE (upsert). Never DELETE + INSERT. The primary key for slot_master is slot_id (SHA-256 hash — stable across re-ingests).
4.6 Seeded Randomness
Every random operation must accept and use an explicit seed. No np.random.seed() global calls. Use np.random.default_rng(seed) for NumPy and pass seed explicitly to Numba.
4.7 Type Annotations
All functions must have complete type annotations. Return types are mandatory. Use from __future__ import annotations at the top of each file.
5. Data Rules
5.1 The slot_master Schema Is Authoritative
The canonical schema is in SCHEMA.md and db/schema.sql. If there is any conflict between these files, db/schema.sql wins. Do not add columns to slot_master without updating both files and creating an Alembic migration.
5.2 q* Assignment Rule
q_star_winner is the minimum trade price of the ex-post winning token during the slot window [t0_utc, t_close_utc). It is computed AFTER resolution is known. It is NEVER the minimum price of both tokens.
# CORRECT
winning_token = "yes" if outcome_yes else "no"
q_star = trades.filter(pl.col("token") == winning_token)["price"].min()
# WRONG — minimum of all trades
q_star = trades["price"].min()
5.3 Quality Flags
| Flag | Meaning | Include in main analysis? |
|---|---|---|
| 0 | Clean | Yes |
| 1 | Missing ticks (>20% gaps or n_trades < 5) | Sensitivity check only |
| 2 | Resolution disputed / BTC sign mismatch | Exclude, report in appendix |
| 3 | Manual exclusion | Exclude always |
5.4 Division by Zero Guard
q_star_winner must never be zero. Apply this guard in the slot constructor:
q_star = max(q_star_raw, 0.001)
if q_star_raw < 0.01:
quality_flag = max(quality_flag, 1)
6. Oracle Formula Tests (Required)
These specific tests MUST exist in tests/test_oracle.py and MUST pass before any oracle code is considered done.
# Test 1: Layer-1 oracle (α=1, perfect outcome)
# Input: q0=0.50, q*=0.35, f=0.02, τ=0.26
# Expected: R_L1 = (1/0.50) * (1-0.02) * (1-0.26) - 1 = 0.4504
assert abs(R_L1(q0=0.50, q_star=0.35, f=0.02, tau=0.26, alpha=1.0) - 0.4504) < 1e-4
# Test 2: Layer-2 oracle (α=0, perfect outcome + timing)
# Input: q0=0.50, q*=0.35, f=0.02, τ=0.26
# Expected: R_L2 = (1/0.35) * (1-0.02) * (1-0.26) - 1 = 1.0721
assert abs(R_L2(q0=0.50, q_star=0.35, f=0.02, tau=0.26) - 1.0721) < 1e-4
# Test 3: Timing premium
# ΔR = R_L2 - R_L1 = (1/0.35 - 1/0.50) * (1-0.02) * (1-0.26)
assert abs(delta_R_timing(q0=0.50, q_star=0.35, f=0.02, tau=0.26) - 0.6217) < 1e-4
# Test 4: Break-even at α=1 (near-50/50 market, Italian tax)
# p* = 0.50 / [(1-0.02)*(1-0.26)] = 0.50 / 0.7252 = 0.6895
assert abs(break_even_p(q0=0.50, q_star=0.35, f=0.02, tau=0.26, alpha=1.0) - 0.6895) < 1e-4
# Test 5: Random investor always loses money (p=0.5, any α, any positive f)
# E[R_net](p=0.5, α=1) with f=0.02, τ=0 should be negative
assert expected_return(p=0.5, alpha=1.0, q0=0.50, q_star=0.35, f=0.02, tau=0.0) < 0
# Test 6: Oracle at α=0 always beats oracle at α=1 (q* < q0)
assert R_L2(q0=0.50, q_star=0.35, f=0.02, tau=0.0) > R_L1(q0=0.50, q_star=0.35, f=0.02, tau=0.0, alpha=1.0)
# Test 7: Fee drag is multiplicative, not additive
# (1-f)(1-τ) < (1-f-τ) for f,τ > 0
f, tau = 0.02, 0.26
assert (1-f)*(1-tau) < (1-f-tau+f*tau) # always true; confirms compound is worse
# Test 8: Division by zero guard (q*=0 must be caught upstream)
# The oracle function should raise ValueError for q_star <= 0
import pytest
with pytest.raises(ValueError):
R_L2(q0=0.50, q_star=0.0, f=0.02, tau=0.26)
# Test 9: IER is finite and meaningful when R_L2 > 0
# Note: IER = E[R_crowd]/R_L2. Since E[R_crowd] < 0 (fees drain crowd), IER < 0.
ier = compute_ier(r_crowd=-0.2748, r_l2=1.0721)
assert ier < 0 # Crowd always loses money relative to oracle
# Test 10: Parametric oracle interpolates between L1 and L2
r_l1 = R_L1(q0=0.50, q_star=0.35, f=0.02, tau=0.26, alpha=1.0)
r_l2 = R_L2(q0=0.50, q_star=0.35, f=0.02, tau=0.26)
for alpha in [0.0, 0.25, 0.5, 0.75, 1.0]:
r = R_parametric(q0=0.50, q_star=0.35, f=0.02, tau=0.26, alpha=alpha)
assert r_l2 >= r >= r_l1 - 1e-9 # monotone in alpha
7. File-by-File Implementation Notes
ingestion/polymarket.py
- Use
httpx.AsyncClientfor all REST calls. Never userequests(blocking). - The CLOB API endpoint for trade history:
GET https://clob.polymarket.com/trades?market={condition_id}&after={timestamp} - For resolution, cross-check with the Polygon UMA oracle contract. The
UmaCtfAdaptercontract address is inconfig/markets.yaml. - Rate limit: max 10 requests/second. Implement token-bucket limiter.
- All HTTP errors must be caught, logged, and retried (max 3 attempts, exponential backoff with jitter).
- Store raw API responses as JSON to
data/raw/{date}/{market_id}.jsonbefore parsing. This enables re-parsing without re-fetching.
ingestion/binance.py
- Use
ccxt.binanceasync client. Symbol:BTC/USDT. - Timeframe:
1s(1-second candles). Fetch all candles in[t0_utc - 10s, t_close_utc + 10s]for each slot. - The extra ±10s buffer ensures alignment with Polymarket slot boundaries.
- If
1scandles are unavailable for historical dates, fall back to1mcandles and interpolate linearly. Flag any interpolated slot withdata_quality_flag = max(flag, 1).
oracle/bounds.py
- Must use Polars LazyFrame throughout. Never call
.collect()until the final write step. - The alpha grid and tax grid must be defined as module-level constants, not hardcoded in functions.
- The IER formula:
IER = R_crowd / R_L2.R_crowdis the expected return of an investor who bets atq0proportionally toq0itself (treats market as fair). For a correct prediction:R_crowd = (1/q0)(1-f)(1-τ) - 1. E[R_crowd] = q0 · R_crowd_correct + (1-q0) · (-1). Simplify:E[R_crowd] = (1-f)(1-τ) - 1 + f(1-τ)q0 + ...— seedocs/MATH_MODEL.mdfor the full expression. - Never use Python loops over DataFrame rows. All computation must be Polars expressions.
montecarlo/spectrum.py
- The Numba function signature MUST match exactly what
pipeline/stages/p6_montecarlo.pypasses. Seedocs/MATH_MODEL.mdSection 4 for the exact expected inputs. - Pre-warm the Numba JIT cache at module import time using a tiny 10-slot synthetic dataset. This prevents 5-minute cold-start in the Airflow DAG.
- Output format: HDF5 file at
data/mc_results/{run_id}.h5with groupswealth_paths,summary_stats,parameters.
viz/dashboard/app.py
- The plain-language break-even sentence must read: "To break even in this market today [jurisdiction], you need to be right on more than X% of your bets."
- The (p, α) heatmap must use a diverging colorscale: red (negative E[R_net]) → white (break-even) → green (positive).
- The IER panel must show: raw IER per slot (scatter, light), 7-day rolling mean (line, bold), and ±1σ band.
pipeline/stages/p3_slots.py
- The
slot_idmust be:hashlib.sha256(f"{platform}:{market_id}:{t0_utc.isoformat()}".encode()).hexdigest() - This must be deterministic and stable — the same slot ingested twice must produce the same
slot_id. - The BTC return sign check: if
btc_return_pct > 0.1(>0.1% up, in percent units) andoutcome_yes == False, setdata_quality_flag = 2. The 0.1 threshold (in percent) avoids flagging near-zero moves where direction is ambiguous.
8. What NOT To Do
- Do not write a single-file script that does ingestion + computation + output. Follow the module structure.
- Do not use pandas. Polars only.
- Do not put secrets in code or config files. All secrets go in
.env. - Do not add class-specific logic to
oracle/,montecarlo/,stats/, orviz/. Those modules must not import fromingestion/. - Do not call
.collect()on a Polars LazyFrame more than once per pipeline stage. - Do not skip writing the unit test first if you are implementing any formula.
- Do not use
globalstate or module-level mutable variables. - Do not catch broad
Exceptionwithout re-raising or logging with full traceback. - Do not write to
slot_masterfrom any module other thanpipeline/stages/p3_slots.py. - Do not delete and re-insert data. Upsert only.
- Do not implement Phase 1 features. Stubs are acceptable (raise
NotImplementedError), but no working code for classes other than C1 in Phase 0.
9. Testing Requirements
- All tests live in
tests/. No test files elsewhere. - Run with:
make test(which runspytest tests/ --cov=. --cov-report=term-missing) - Coverage requirements:
oracle/bounds.py: 100% (no exceptions)oracle/breakeven.py: 100%montecarlo/spectrum.py: ≥90%pipeline/stages/p3_slots.py: ≥90%- All other modules: ≥80%
- Every test that touches the oracle formula must use the exact numerical cases from Section 6.
- Integration tests (those requiring a live DB) must be marked
@pytest.mark.integrationand skipped in CI unlessINTEGRATION_TESTS=1is set.
10. Logging and Error Handling
Use Python's logging module. No print() statements in production code (only in __main__ blocks).
import logging
logger = logging.getLogger(__name__)
# In each pipeline stage:
logger.info("P3 slot constructor: processing %d trades for date %s", n_trades, date)
logger.warning("Slot %s: q_star < 0.01, setting quality_flag=1", slot_id)
logger.error("P1 ingestion failed for market %s: %s", market_id, exc, exc_info=True)
Airflow task failure policy: retry 3 times with 5-minute delay. On third failure, send alert (configured via Airflow email connection).
11. Commit and PR Guidelines
- Commit messages:
type(scope): description— e.g.,feat(oracle): add vectorised alpha grid expansion - Types:
feat,fix,test,refactor,docs,chore - One logical change per commit
- Before any PR:
make testandmake lintmust both pass - Every new formula implementation needs a corresponding test in the same commit
12. How to Verify Your Work End-to-End
After completing Phase 0, run this validation sequence:
# 1. Clean environment
docker-compose down -v && docker-compose up -d
make schema
# 2. Ingest one week of BTC data
make backfill START=2026-03-01 END=2026-03-08
# 3. Run oracle computation
make oracle
# 4. Spot-check break-even (should be ~0.69 for IT, α=1, BTC near 50/50)
psql $DB_URL -c "SELECT alpha, tau_label, p_star FROM break_even_surface WHERE class_code='C1' ORDER BY alpha, tau_label;"
# 5. Run Monte Carlo
make mc SEED=42
# 6. Check: oracle (p=1) must have E[R_net] > 0; random (p=0.5) must have E[R_net] < 0
# (Check mc_results HDF5 summary_stats group)
# 7. Run statistics
make analyse
# 8. Verify dashboard
make dashboard
# → Open :8050 and confirm: IER time-series visible, p* sentence displays
# 9. Generate paper outputs
make figures
ls paper/tables/ paper/figures/ # must be non-empty
# 10. Full reproducibility test
make reproduce
If all 10 steps succeed, Phase 0 is complete.