Repository Guidelines
Project Structure & Module Organization
Keep the layout aligned with PRD.md. Library-style pipeline logic belongs in pipelines/, while thin CLI wrappers live in scripts/. Harvested inputs flow from data/legacy_ro/ (read-only) into data/intake/ with hashes recorded in manifests/harvest_manifest.csv; intermediate and blessed outputs go to data/interim/ and data/processed/. QC panels land under qc/panels/, and reproducibility checks plus fixtures live in tests/. Update RUNBOOK.md whenever you add or retire a blessed command.
Build, Test, and Development Commands
Set up the environment via requirements.txt (make env && source .venv/bin/activate). The Makefile is the authoritative interface for working commands (make env, make validate, make golden, make test-lidar, make thermal).
For iterative debugging you may invoke stage scripts directly, e.g.
.venv/bin/python scripts/run_lidar_hag.py \
--data-root data/legacy_ro/penguin-2.0/data/raw/LiDAR/sample \
--out data/interim/lidar_test.json \
--emit-geojson --crs-epsg 32720 --plots --strict-outputs
Always rerun the golden AOI guardrail with .venv/bin/python -m pytest -q tests/test_golden_aoi.py before shipping changes.
Coding Style & Naming Conventions
Target Python 3.12.x, four-space indentation, and snake_case for modules, files, functions, and variables; reserve PascalCase for dataclasses or typed containers. Keep stage parameters and IO contracts encapsulated inside pipelines/<stage>.py, exposing a single run() entry point that the script imports. Run ruff check and ruff format; add type hints and short docstrings noting inputs, side effects, and outputs.
Testing Guidelines
Place unit and integration tests beside the stage they exercise; name files test_<stage>.py and keep fixtures in helper modules suffixed _fixtures.py. Golden AOI tests must assert the presence, schema, and hashes (when available) of outputs such as candidates.gpkg, rollup_counts.json, and QC PNGs. Add regression coverage whenever parameters, schemas, or filenames change, and mirror new QC metrics in manifests/qc_report.md.
Commit & Pull Request Guidelines
Write concise, imperative commit subjects (stage: action, e.g. lidar: clamp hag thresholds) and include provenance notes (legacy source path, SHA) in the body when harvesting. Every PR should: (1) link the tracked task or issue, (2) summarize parameter or schema changes, (3) attach before/after QC thumbnails when relevant, and (4) confirm make golden plus pytest ran cleanly. Never commit harvested data—reference manifest entries instead.
Security & Data Handling
Treat data/legacy_ro/ as immutable. Copy artifacts through harvesting scripts only, populate manifests/harvest_manifest.csv with SHA256 and size, and keep credentials or API tokens out of the repo. Scrub logs before sharing externally and note any sensitive paths in PR descriptions.
Cursor IDE Project Rules
This repo uses Cursor Project Rules under .cursor/rules/*.mdc to keep AI-assisted edits aligned with project constraints.
- The rule
00-core-invariants.mdcis always applied and encodes non-negotiables likedata/legacy_ro/immutability, determinism, and the single sources of truth (RUNBOOK.md,docs/reports/STATUS.md,notes/pipeline_todo.md). - Other rules are scoped by file globs (Python quality, geospatial CRS correctness, testing guardrails, git hygiene) and are attached automatically when working in matching files.