name: explore-dnn-model description: Manual invocation only; use only when the user explicitly requests `explore-dnn-model` by name. Explore how to run a given DNN model checkpoint in the current Python environment by locating weights + upstream source code, resolving dependencies with user confirmation, running reproducible experiments under `tmp/`, and producing reports about I/O contracts, timing, and profiling.

Explore DNN Model

Minimum Required Inputs (Hard Requirement)

To use this skill, the user must provide:

A model checkpoint / model file(s) as a local file or directory path (it may be outside the workspace).

If the user provides only the checkpoint path (no model name, repo link, or source code), proceed by:

Attempting to identify the model name/family from the checkpoint file/dir itself (filenames, adjacent configs/README, embedded metadata, state_dict key patterns, etc.).
Searching for the implementation in the workspace and/or alongside the checkpoint directory (e.g., nearby Python packages, inference scripts, config files).
If still not found, using the best-guess model name/family to search online for the canonical implementation, then cloning the upstream source into tmp/<experiment-dir>/refs/ for investigation (prefer shallow clone; record URL + commit/tag used).

Goals

This skill has three goals:

Verify that the given DNN model can work (inference or training; default focus is inference) in the current Python environment of the workspace.
Determine how to use it (inference or training; default is inference) by reading the upstream source code and producing minimal, reproducible runs.
Produce two reports:
- Experiment report (programmatic): generated from tmp/<experiment-dir>/outputs/ with minimal/no reasoning.
- Stakeholder report (agent-written): generated by the agent from the experiment report + outputs/logs, with deeper analysis and recommendations.

The reports cover:

Input and output contracts (formats, shapes, dtypes, preprocessing/postprocessing)
Benchmarks and performance profiling (latency/throughput/memory, device details)
User-provided metrics/targets (e.g., accuracy, mAP, IoU, F1, latency budget), and whether/how they are met

Before changing anything, detect how the environment is managed by checking for:

pixi.toml and/or pyproject.toml (Pixi-managed project)
.venv/ (venv-managed project)

Dependency Policy (Ask Once, Then Apply)

If any dependency is missing:

Do not install it automatically without user confirmation.
List the missing packages (and versions/constraints if known) and ask the developer how to proceed.
Provide clear options, let the developer choose, then proceed with the chosen approach.
Once the developer confirms an approach, apply it for all newly required packages (no need to ask approval per package).

Version Strategy

First attempt: use the latest versions resolved by the selected package manager (pixi, pip, uv).
If that fails (import/runtime errors, incompatibilities): fall back to the specific versions/constraints documented by the model’s upstream source code or docs.

Preferred Options (in order)

Pixi-managed env

Ask the user to choose one:
- Modify the current Pixi environment by adding deps to the relevant manifest (pixi.toml / pyproject.toml).
- Create a new Pixi environment specifically to test this model.
Then use pixi install/pixi run ... to execute.
Prefer PyPI packages over conda-forge when both are available.
Avoid direct pip install ... into the Pixi environment unless the developer explicitly requests it.

.venv-managed env Ask the user to choose one:

Install deps via pip (or uv pip) into the current .venv.
Create a new venv specifically for this model (keeps the repo venv clean).

Inputs to Collect (ask if missing)

Model name and/or upstream repo link and/or source code path (optional but speeds up identification)
Model task/modality if unclear (classification/detection/segmentation/embedding/audio/video/etc.)
Checkpoint path (file/dir) and format (.pt, .pth, .onnx, .engine, etc.)
Any known I/O contract details (expected resolution, channel order, normalization, label mapping), if the user has them
CPU-only requirement (only if the user explicitly requests CPU-only)
Optional: user-provided metrics/targets to evaluate (quality and/or performance)

Notes:

Determine framework/runtime automatically from checkpoint type + upstream code/docs + what’s available in the current Python environment.
If hardware is unspecified, default to using hardware acceleration when available (CUDA GPU, ROCm GPU, Apple MPS, etc.). Use CPU-only only if the user requested it.
If unspecified, the default objective is to confirm the model runs end-to-end from input → output (prefer real inputs found in the workspace; synthesize as a fallback) and record end-to-end timing.

Core Workflow

0) Confirm artifacts and pick the target environment

Confirm the minimum required inputs are present:
- Checkpoint/model path is accessible locally (file/dir exists). It may be outside the workspace.
- If model name/repo/source path is not provided, start by inferring it from the checkpoint and nearby files; if needed, locate it online and clone into tmp/<experiment-dir>/refs/.
Detect environment type:
- If both Pixi and .venv exist, ask the user which one should be treated as the “current” environment for this exploration.
Device default:
- If the user did not request CPU-only, use hardware acceleration when available (CUDA/ROCm/MPS/etc.).

1) Locate and read the upstream source code/docs

First try to find the implementation locally:
- Search the workspace and the checkpoint directory for source code, inference scripts, configs, and docs.
- Prefer local source if it appears to be the canonical/official implementation for the checkpoint.
If local source is not available or is clearly incomplete, use online search to find the canonical implementation:
- Official GitHub repo, paper, model card, or vendor docs.
- Check out the upstream repo under tmp/<experiment-dir>/refs/<repo-name> using a shallow clone (--depth=1), pinning a tag/commit when possible.
Download/check out the relevant source code (pin a tag/commit when possible) and identify:
- The exact inference entrypoints (scripts/modules), model class, preprocessing, postprocessing, and label mapping.
- Any config files required to construct the model (YAML/JSON/TOML).
Do not “guess” preprocessing/postprocessing: confirm from code and/or reference examples.

2) Derive required dependencies

Before running the model or changing the environment, determine the minimal dependencies required to run the model by using (in priority order):

Upstream source code (setup files, requirements*.txt, pyproject.toml, import graph).
Upstream docs/model card (pinned versions, known-good combos).
Checkpoint type (e.g., .onnx implies ONNX Runtime; .pt/.pth implies PyTorch; .engine implies TensorRT).

Make a concise dependency list covering:

Runtime/framework (e.g., torch, onnxruntime, opencv-python)
Model-specific libs (e.g., ultralytics, timm, transformers, mmengine, etc.)
Utility deps used by the official inference path (e.g., numpy, Pillow, pyyaml)
Optional acceleration deps (CUDA/TensorRT) separated from the CPU baseline

3) Resolve missing dependencies (with user choice)

Check whether each required dependency is available in the current environment.
If anything is missing, ask the user which path to take:
- Pixi: modify current manifest to add deps, or create a new Pixi env for this model.
- Venv: install into current .venv, or create a new venv for this model.
After the user confirms, apply the decision for all required packages (no per-package prompts).
Use the Version Strategy above (latest first; fall back to pinned versions if needed).
After dependency changes, run a quick smoke test:
- Imports for the core runtime stack
- Minimal “load model” path (without a full benchmark yet)

4) Ensure the checkpoint exists locally

Do not download checkpoints automatically.
Developers must provide checkpoints/model files (local file/dir paths).
If the checkpoint is missing or only a URL is provided, ask the developer to download it and provide the local path.
If the developer wants a conventional location, prefer checkpoints/ (gitignored).
Record provenance in a short note (based on what the developer provides):
- Claimed source URL(s) or repo, version/commit/tag (if known), file size, and (if feasible) SHA256.

5) Create an experiment workspace under `tmp/`

Default experiment directory:

<workspace>/tmp/<experiment-slug>-<time>

If the user specifies a different location/name, use the user-provided one instead.

Create the standard directory layout:

tmp/<experiment-dir>/
  README.md     # experiment intent + directory guide (keep updated)
  refs/         # checked-out upstream repos (use shallow clone for online checkouts)
    README.md
  scripts/      # throwaway but reproducible scripts (committed if useful)
    README.md
  inputs/       # downloaded/synthesized test inputs
    README.md
  outputs/      # artifacts + machine-readable stats (e.g., `stats.json`)
    README.md
  logs/         # logs (stdout/stderr, profiling traces, command transcripts)
    README.md
  reports/      # markdown notes: what was tried, params, results
    README.md
    figures/    # images embedded in reports
    experiment-report.md
    stakeholder-report.md

Shell safety note (avoid accidental directory names):

Do not use bash brace expansion to create these folders (e.g., mkdir -p "$exp"/{refs,scripts,...}), because quoting/spacing mistakes can create literal directories like {refs,scripts,...}.

Prefer a simple loop or explicit mkdir -p calls, for example:

exp="tmp/<experiment-dir>"
mkdir -p "$exp"
for d in refs scripts inputs outputs logs reports reports/figures; do
  mkdir -p "$exp/$d"
done

Conventions:

Use relative paths from tmp/<experiment-dir> in scripts so the folder is movable.
Keep scripts small and single-purpose (01_download_inputs.py, 10_infer.py, 20_visualize.py, …).
Run Python via the selected environment manager:
- Pixi: pixi run python ...
- Venv: use the venv’s Python (avoid system Python)

README requirements:

Create tmp/<experiment-dir>/README.md to describe:
- The intention of the experiment (what model, what checkpoint, what question you’re answering)
- How to reproduce (one-line pointer to the primary script(s))
- A brief map of what each top-level subdir contains
Each top-level subdir must have its own README.md that:
- Describes what belongs in the folder
- Notes any important changes (append a short “Changes” section as you iterate)

6) Collect or synthesize inputs

First try to find suitable inputs already present in the workspace (e.g., under datasets/, downloads/, or other project-specific data dirs) based on what you learned from the checkpoint/source code (task, modality, expected resolution, file types).
If no suitable inputs exist locally, synthesize minimal inputs that satisfy the model contract (e.g., generated images, random tensors saved in the expected container format, short synthetic video).
Save all chosen/generated inputs under tmp/<experiment-dir>/inputs/.

7) Run minimal, traceable inference experiments (default: inference + end-to-end timing)

Start with a single known-good example (from upstream repo) if available.
Save every “input → output” mapping:
- Inputs: the exact file(s) used + preprocessing parameters.
- Outputs: raw model outputs + any decoded/visualized artifacts.
- Command line + environment notes (device, precision, batch size).
Measure end-to-end timing by default:
- At minimum: one cold run + a small number of warm runs (record mean/median).
Persist stats that will appear in the report:
- For any timing/profiling/memory/throughput numbers you plan to put into the report, also write a JSON version under tmp/<experiment-dir>/outputs/ (e.g., outputs/stats.json).
Capture logs by default:
- Save stdout/stderr and command transcripts under tmp/<experiment-dir>/logs/.
If the model is accessed via HTTP/gRPC, save request/response payloads (sanitized) under reports/ and/or outputs/.

7b) (Optional) Training sanity check

If the user asks to validate training (or if inference is insufficient to validate “works”):

Start with a minimal configuration (single batch / tiny subset) to confirm the forward + backward pass runs.
Record key configs (optimizer, LR, batch size, mixed precision) and any dataset assumptions.
Do not run long trainings unless the user explicitly requests it.

8) Produce reports

8a) Ensure machine-readable report inputs exist (in `outputs/`)

Write/collect machine-readable files in tmp/<experiment-dir>/outputs/ that the report generator can consume, at minimum:

stats.json (timing/throughput/memory/profile numbers)
A JSON describing key parameters used (preprocess/postprocess/runtime thresholds)
A JSON describing the I/O contract (input expectations + output structure)
A JSON listing key artifacts produced (paths to representative inputs/outputs)

Keep these JSON files as the source of truth for anything that will appear as “final stats” in the experiment report.

8b) Generate `reports/experiment-report.md` programmatically

Generate tmp/<experiment-dir>/reports/experiment-report.md by reading only tmp/<experiment-dir>/outputs/ (and optionally logs/ for pointers), with minimal/no reasoning.
If images are part of the inputs/outputs, copy representative images into tmp/<experiment-dir>/reports/figures/ and embed them in the markdown via relative paths (e.g., figures/<name>.png).

8c) Write `reports/stakeholder-report.md` (agent-written)

Read reports/experiment-report.md plus relevant outputs/ and logs/.
Produce tmp/<experiment-dir>/reports/stakeholder-report.md with deeper analysis that requires reasoning:
- Interpret results vs expectations/targets
- Call out risks, assumptions, and failure modes
- Recommend next experiments and concrete integration guidance (if requested)
- Summarize “go/no-go” criteria and what remains unknown

Also include:

Benchmark & profiling results:
- CPU/GPU model, RAM/VRAM, OS, Python version, key library versions
- Latency breakdown if possible (preprocess / model / postprocess)
- Throughput (items/s) and peak memory/VRAM
Stats JSON:
- For any stats included in the report, ensure the same values exist in a JSON file under tmp/<experiment-dir>/outputs/ (e.g., outputs/stats.json).
User metrics (if provided):
- The metric definition + measurement method
- Results on the chosen evaluation inputs
- Any deltas vs the user’s targets and suggested next experiments

Guardrails

Do not commit large checkpoints or huge outputs; keep them under gitignored paths (checkpoints/, tmp/).
Respect upstream licenses; record the repo URL + commit/tag in reports/.
Avoid modifying runtime code under src/ unless the user explicitly requests integration; keep exploration isolated to tmp/<experiment-dir>.

ナビゲーション

Skillsとは？

リンク

explore-dnn-model

Explore DNN Model

Minimum Required Inputs (Hard Requirement)

Goals

Dependency Policy (Ask Once, Then Apply)

Version Strategy

Preferred Options (in order)

Inputs to Collect (ask if missing)

Core Workflow

0) Confirm artifacts and pick the target environment

1) Locate and read the upstream source code/docs

2) Derive required dependencies

3) Resolve missing dependencies (with user choice)

4) Ensure the checkpoint exists locally

5) Create an experiment workspace under `tmp/`

6) Collect or synthesize inputs

7) Run minimal, traceable inference experiments (default: inference + end-to-end timing)

7b) (Optional) Training sanity check

8) Produce reports

8a) Ensure machine-readable report inputs exist (in `outputs/`)

8b) Generate `reports/experiment-report.md` programmatically

8c) Write `reports/stakeholder-report.md` (agent-written)

Guardrails

関連スキル(🌐 Web開発)

ナビゲーション

Skillsとは？

リンク

explore-dnn-model

Explore DNN Model

Minimum Required Inputs (Hard Requirement)

Goals

Dependency Policy (Ask Once, Then Apply)

Version Strategy

Preferred Options (in order)

Inputs to Collect (ask if missing)

Core Workflow

0) Confirm artifacts and pick the target environment

1) Locate and read the upstream source code/docs

2) Derive required dependencies

3) Resolve missing dependencies (with user choice)

4) Ensure the checkpoint exists locally

5) Create an experiment workspace under tmp/

6) Collect or synthesize inputs

7) Run minimal, traceable inference experiments (default: inference + end-to-end timing)

7b) (Optional) Training sanity check

8) Produce reports

8a) Ensure machine-readable report inputs exist (in outputs/)

8b) Generate reports/experiment-report.md programmatically

8c) Write reports/stakeholder-report.md (agent-written)

Guardrails

関連スキル(🌐 Web開発)

5) Create an experiment workspace under `tmp/`

8a) Ensure machine-readable report inputs exist (in `outputs/`)

8b) Generate `reports/experiment-report.md` programmatically

8c) Write `reports/stakeholder-report.md` (agent-written)