name: explore-dnn-model
description: Manual invocation only; use only when the user explicitly requests explore-dnn-model by name. Explore how to run a given DNN model checkpoint in the current Python environment by locating weights + upstream source code, resolving dependencies with user confirmation, running reproducible experiments under tmp/, and producing reports about I/O contracts, timing, and profiling.
Explore DNN Model
Minimum Required Inputs (Hard Requirement)
To use this skill, the user must provide:
- A model checkpoint / model file(s) as a local file or directory path (it may be outside the workspace).
If the user provides only the checkpoint path (no model name, repo link, or source code), proceed by:
- Attempting to identify the model name/family from the checkpoint file/dir itself (filenames, adjacent configs/README, embedded metadata,
state_dictkey patterns, etc.). - Searching for the implementation in the workspace and/or alongside the checkpoint directory (e.g., nearby Python packages, inference scripts, config files).
- If still not found, using the best-guess model name/family to search online for the canonical implementation, then cloning the upstream source into
tmp/<experiment-dir>/refs/for investigation (prefer shallow clone; record URL + commit/tag used).
Goals
This skill has three goals:
- Verify that the given DNN model can work (inference or training; default focus is inference) in the current Python environment of the workspace.
- Determine how to use it (inference or training; default is inference) by reading the upstream source code and producing minimal, reproducible runs.
- Produce two reports:
- Experiment report (programmatic): generated from
tmp/<experiment-dir>/outputs/with minimal/no reasoning. - Stakeholder report (agent-written): generated by the agent from the experiment report + outputs/logs, with deeper analysis and recommendations.
- Experiment report (programmatic): generated from
The reports cover:
- Input and output contracts (formats, shapes, dtypes, preprocessing/postprocessing)
- Benchmarks and performance profiling (latency/throughput/memory, device details)
- User-provided metrics/targets (e.g., accuracy, mAP, IoU, F1, latency budget), and whether/how they are met
Before changing anything, detect how the environment is managed by checking for:
pixi.tomland/orpyproject.toml(Pixi-managed project).venv/(venv-managed project)
Dependency Policy (Ask Once, Then Apply)
If any dependency is missing:
- Do not install it automatically without user confirmation.
- List the missing packages (and versions/constraints if known) and ask the developer how to proceed.
- Provide clear options, let the developer choose, then proceed with the chosen approach.
- Once the developer confirms an approach, apply it for all newly required packages (no need to ask approval per package).
Version Strategy
- First attempt: use the latest versions resolved by the selected package manager (
pixi,pip,uv). - If that fails (import/runtime errors, incompatibilities): fall back to the specific versions/constraints documented by the model’s upstream source code or docs.
Preferred Options (in order)
Pixi-managed env
- Ask the user to choose one:
- Modify the current Pixi environment by adding deps to the relevant manifest (
pixi.toml/pyproject.toml). - Create a new Pixi environment specifically to test this model.
- Modify the current Pixi environment by adding deps to the relevant manifest (
- Then use
pixi install/pixi run ...to execute. - Prefer PyPI packages over conda-forge when both are available.
- Avoid direct
pip install ...into the Pixi environment unless the developer explicitly requests it.
.venv-managed env
Ask the user to choose one:
- Install deps via
pip(oruv pip) into the current.venv. - Create a new venv specifically for this model (keeps the repo venv clean).
Inputs to Collect (ask if missing)
- Model name and/or upstream repo link and/or source code path (optional but speeds up identification)
- Model task/modality if unclear (classification/detection/segmentation/embedding/audio/video/etc.)
- Checkpoint path (file/dir) and format (
.pt,.pth,.onnx,.engine, etc.) - Any known I/O contract details (expected resolution, channel order, normalization, label mapping), if the user has them
- CPU-only requirement (only if the user explicitly requests CPU-only)
- Optional: user-provided metrics/targets to evaluate (quality and/or performance)
Notes:
- Determine framework/runtime automatically from checkpoint type + upstream code/docs + what’s available in the current Python environment.
- If hardware is unspecified, default to using hardware acceleration when available (CUDA GPU, ROCm GPU, Apple MPS, etc.). Use CPU-only only if the user requested it.
- If unspecified, the default objective is to confirm the model runs end-to-end from input → output (prefer real inputs found in the workspace; synthesize as a fallback) and record end-to-end timing.
Core Workflow
0) Confirm artifacts and pick the target environment
- Confirm the minimum required inputs are present:
- Checkpoint/model path is accessible locally (file/dir exists). It may be outside the workspace.
- If model name/repo/source path is not provided, start by inferring it from the checkpoint and nearby files; if needed, locate it online and clone into
tmp/<experiment-dir>/refs/.
- Detect environment type:
- If both Pixi and
.venvexist, ask the user which one should be treated as the “current” environment for this exploration.
- If both Pixi and
- Device default:
- If the user did not request CPU-only, use hardware acceleration when available (CUDA/ROCm/MPS/etc.).
1) Locate and read the upstream source code/docs
- First try to find the implementation locally:
- Search the workspace and the checkpoint directory for source code, inference scripts, configs, and docs.
- Prefer local source if it appears to be the canonical/official implementation for the checkpoint.
- If local source is not available or is clearly incomplete, use online search to find the canonical implementation:
- Official GitHub repo, paper, model card, or vendor docs.
- Check out the upstream repo under
tmp/<experiment-dir>/refs/<repo-name>using a shallow clone (--depth=1), pinning a tag/commit when possible.
- Download/check out the relevant source code (pin a tag/commit when possible) and identify:
- The exact inference entrypoints (scripts/modules), model class, preprocessing, postprocessing, and label mapping.
- Any config files required to construct the model (YAML/JSON/TOML).
- Do not “guess” preprocessing/postprocessing: confirm from code and/or reference examples.
2) Derive required dependencies
Before running the model or changing the environment, determine the minimal dependencies required to run the model by using (in priority order):
- Upstream source code (setup files,
requirements*.txt,pyproject.toml, import graph). - Upstream docs/model card (pinned versions, known-good combos).
- Checkpoint type (e.g.,
.onnximplies ONNX Runtime;.pt/.pthimplies PyTorch;.engineimplies TensorRT).
Make a concise dependency list covering:
- Runtime/framework (e.g.,
torch,onnxruntime,opencv-python) - Model-specific libs (e.g.,
ultralytics,timm,transformers,mmengine, etc.) - Utility deps used by the official inference path (e.g.,
numpy,Pillow,pyyaml) - Optional acceleration deps (CUDA/TensorRT) separated from the CPU baseline
3) Resolve missing dependencies (with user choice)
- Check whether each required dependency is available in the current environment.
- If anything is missing, ask the user which path to take:
- Pixi: modify current manifest to add deps, or create a new Pixi env for this model.
- Venv: install into current
.venv, or create a new venv for this model.
- After the user confirms, apply the decision for all required packages (no per-package prompts).
- Use the Version Strategy above (latest first; fall back to pinned versions if needed).
- After dependency changes, run a quick smoke test:
- Imports for the core runtime stack
- Minimal “load model” path (without a full benchmark yet)
4) Ensure the checkpoint exists locally
- Do not download checkpoints automatically.
- Developers must provide checkpoints/model files (local file/dir paths).
- If the checkpoint is missing or only a URL is provided, ask the developer to download it and provide the local path.
- If the developer wants a conventional location, prefer
checkpoints/(gitignored). - Record provenance in a short note (based on what the developer provides):
- Claimed source URL(s) or repo, version/commit/tag (if known), file size, and (if feasible) SHA256.
5) Create an experiment workspace under tmp/
Default experiment directory:
<workspace>/tmp/<experiment-slug>-<time>
If the user specifies a different location/name, use the user-provided one instead.
Create the standard directory layout:
tmp/<experiment-dir>/
README.md # experiment intent + directory guide (keep updated)
refs/ # checked-out upstream repos (use shallow clone for online checkouts)
README.md
scripts/ # throwaway but reproducible scripts (committed if useful)
README.md
inputs/ # downloaded/synthesized test inputs
README.md
outputs/ # artifacts + machine-readable stats (e.g., `stats.json`)
README.md
logs/ # logs (stdout/stderr, profiling traces, command transcripts)
README.md
reports/ # markdown notes: what was tried, params, results
README.md
figures/ # images embedded in reports
experiment-report.md
stakeholder-report.md
Shell safety note (avoid accidental directory names):
-
Do not use bash brace expansion to create these folders (e.g.,
mkdir -p "$exp"/{refs,scripts,...}), because quoting/spacing mistakes can create literal directories like{refs,scripts,...}. -
Prefer a simple loop or explicit
mkdir -pcalls, for example:exp="tmp/<experiment-dir>" mkdir -p "$exp" for d in refs scripts inputs outputs logs reports reports/figures; do mkdir -p "$exp/$d" done
Conventions:
- Use relative paths from
tmp/<experiment-dir>in scripts so the folder is movable. - Keep scripts small and single-purpose (
01_download_inputs.py,10_infer.py,20_visualize.py, …). - Run Python via the selected environment manager:
- Pixi:
pixi run python ... - Venv: use the venv’s Python (avoid system Python)
- Pixi:
README requirements:
- Create
tmp/<experiment-dir>/README.mdto describe:- The intention of the experiment (what model, what checkpoint, what question you’re answering)
- How to reproduce (one-line pointer to the primary script(s))
- A brief map of what each top-level subdir contains
- Each top-level subdir must have its own
README.mdthat:- Describes what belongs in the folder
- Notes any important changes (append a short “Changes” section as you iterate)
6) Collect or synthesize inputs
- First try to find suitable inputs already present in the workspace (e.g., under
datasets/,downloads/, or other project-specific data dirs) based on what you learned from the checkpoint/source code (task, modality, expected resolution, file types). - If no suitable inputs exist locally, synthesize minimal inputs that satisfy the model contract (e.g., generated images, random tensors saved in the expected container format, short synthetic video).
- Save all chosen/generated inputs under
tmp/<experiment-dir>/inputs/.
7) Run minimal, traceable inference experiments (default: inference + end-to-end timing)
- Start with a single known-good example (from upstream repo) if available.
- Save every “input → output” mapping:
- Inputs: the exact file(s) used + preprocessing parameters.
- Outputs: raw model outputs + any decoded/visualized artifacts.
- Command line + environment notes (device, precision, batch size).
- Measure end-to-end timing by default:
- At minimum: one cold run + a small number of warm runs (record mean/median).
- Persist stats that will appear in the report:
- For any timing/profiling/memory/throughput numbers you plan to put into the report, also write a JSON version under
tmp/<experiment-dir>/outputs/(e.g.,outputs/stats.json).
- For any timing/profiling/memory/throughput numbers you plan to put into the report, also write a JSON version under
- Capture logs by default:
- Save stdout/stderr and command transcripts under
tmp/<experiment-dir>/logs/.
- Save stdout/stderr and command transcripts under
- If the model is accessed via HTTP/gRPC, save request/response payloads (sanitized) under
reports/and/oroutputs/.
7b) (Optional) Training sanity check
If the user asks to validate training (or if inference is insufficient to validate “works”):
- Start with a minimal configuration (single batch / tiny subset) to confirm the forward + backward pass runs.
- Record key configs (optimizer, LR, batch size, mixed precision) and any dataset assumptions.
- Do not run long trainings unless the user explicitly requests it.
8) Produce reports
8a) Ensure machine-readable report inputs exist (in outputs/)
Write/collect machine-readable files in tmp/<experiment-dir>/outputs/ that the report generator can consume, at minimum:
stats.json(timing/throughput/memory/profile numbers)- A JSON describing key parameters used (preprocess/postprocess/runtime thresholds)
- A JSON describing the I/O contract (input expectations + output structure)
- A JSON listing key artifacts produced (paths to representative inputs/outputs)
Keep these JSON files as the source of truth for anything that will appear as “final stats” in the experiment report.
8b) Generate reports/experiment-report.md programmatically
- Generate
tmp/<experiment-dir>/reports/experiment-report.mdby reading onlytmp/<experiment-dir>/outputs/(and optionallylogs/for pointers), with minimal/no reasoning. - If images are part of the inputs/outputs, copy representative images into
tmp/<experiment-dir>/reports/figures/and embed them in the markdown via relative paths (e.g.,figures/<name>.png).
8c) Write reports/stakeholder-report.md (agent-written)
- Read
reports/experiment-report.mdplus relevantoutputs/andlogs/. - Produce
tmp/<experiment-dir>/reports/stakeholder-report.mdwith deeper analysis that requires reasoning:- Interpret results vs expectations/targets
- Call out risks, assumptions, and failure modes
- Recommend next experiments and concrete integration guidance (if requested)
- Summarize “go/no-go” criteria and what remains unknown
Also include:
- Benchmark & profiling results:
- CPU/GPU model, RAM/VRAM, OS, Python version, key library versions
- Latency breakdown if possible (preprocess / model / postprocess)
- Throughput (items/s) and peak memory/VRAM
- Stats JSON:
- For any stats included in the report, ensure the same values exist in a JSON file under
tmp/<experiment-dir>/outputs/(e.g.,outputs/stats.json).
- For any stats included in the report, ensure the same values exist in a JSON file under
- User metrics (if provided):
- The metric definition + measurement method
- Results on the chosen evaluation inputs
- Any deltas vs the user’s targets and suggested next experiments
Guardrails
- Do not commit large checkpoints or huge outputs; keep them under gitignored paths (
checkpoints/,tmp/). - Respect upstream licenses; record the repo URL + commit/tag in
reports/. - Avoid modifying runtime code under
src/unless the user explicitly requests integration; keep exploration isolated totmp/<experiment-dir>.