NSD Evaluation Reproducibility Guide

This file defines the reproducible analysis and plotting rules for the nsd_evaluation/ workflow.

Scope

Directory: nsd_evaluation/
Inputs: NSD native predictions and labels.
Outputs:
- metric CSV files in nsd_evaluation/results/
- visualization figures in figures/nsd_evaluation/

Environment

Always run Python with the masam_blackwell environment.
Preferred command style:
- conda run -n masam_blackwell python <script>.py ...

Canonical Evaluation Settings (EVC, r2 > 10)

Use these settings unless explicitly changed.

ROI definition: EVC V1/V2/V3 ventral+dorsal (ROI labels 1,2,3,4,5,6).
R2 threshold: --r2_threshold 10 --r2_comparison gt.
Eccentricity additional mask: --eccentricity_gt_max 12.
Predictions: eccentricity, polarAngle, pRFsize.
Model types: baseline, transolver_optionA, transolver_optionC.
Seeds: 0 1 2.

Reference command:

conda run -n masam_blackwell python nsd_evaluation/evaluate_nsd_native_predictions_batch.py \
  --nsd_dir /mnt/storage/junb/natural-scenes-dataset/nsddata/freesurfer \
  --prediction_root <PREDICTION_ROOT> \
  --label_root /mnt/storage/junb/natural-scenes-dataset/nsddata/freesurfer \
  --hemispheres lh rh \
  --predictions eccentricity polarAngle pRFsize \
  --model_types baseline transolver_optionA transolver_optionC \
  --seeds 0 1 2 \
  --r2_threshold 10 \
  --r2_comparison gt \
  --eccentricity_gt_max 12 \
  --roi_mode evc_v1_v2_v3 \
  --prfangle_plot_dir figures/nsd_evaluation/prfangle_vertex_similarity \
  --output_csv nsd_evaluation/results/nsd_native_prediction_metrics_evc_r2gt10.csv \
  --summary_csv nsd_evaluation/results/nsd_native_prediction_metrics_summary_evc_r2gt10.csv

Metric Policy

polarAngle: use circular_correlation_mean (astropy circular correlation) as primary.
eccentricity: use both pearson_correlation_mean and spearman_correlation_mean.
pRFsize: use both pearson_correlation_mean and spearman_correlation_mean.
For comparability, keep pearson_correlation_mean for polarAngle as auxiliary.
For NSD model-performance comparison plots, use: polarAngle -> circular_correlation_mean, eccentricity -> correlation_mean, pRFsize -> spearman_correlation_mean.

RH Renewed Update Rule

When renewed RH weights are applied and evaluated:

Write renewed outputs with _RH_renewed suffix, for example:
- nsd_from_fs_curv_native_prediction_metrics_evc_r2gt10_RH_renewed.csv
- nsd_from_fs_curv_native_prediction_metrics_summary_evc_r2gt10_RH_renewed.csv
Compare against old RH results with _RH_renewed_vs_old suffix:
- per-subject comparison CSV
- summary comparison CSV
Before replacing canonical files, create backups:
- *_before_RH_renew.csv
Replace only RH rows in canonical files; keep LH rows unchanged.

Canonical CSV Files

Main from-fs-curv evaluation files:

nsd_evaluation/results/nsd_from_fs_curv_native_prediction_metrics_evc_r2gt10.csv
nsd_evaluation/results/nsd_from_fs_curv_native_prediction_metrics_summary_evc_r2gt10.csv

Native baseline files:

nsd_evaluation/results/nsd_native_prediction_metrics_evc_r2gt10.csv
nsd_evaluation/results/nsd_native_prediction_metrics_summary_evc_r2gt10.csv

Comparison CSV Conventions

For side-by-side native vs from-fs-curv comparisons:

Per-seed, per-setting, per-hemisphere:
- nsd_evaluation/results/nsd_from_fs_curv_vs_native_side_by_side_by_setting_hemisphere.csv
Seed-aggregated (mean/std/CI95) per setting + hemisphere:
- nsd_evaluation/results/nsd_from_fs_curv_vs_native_side_by_setting_hemisphere_seedavg.csv

delta_* columns must always be:

from_fs_curv - native

Plot Reproducibility Rules

1) Combined source comparison bar plot

Script:

nsd_evaluation/plot_nsd_model_type_comparison_bar.py

Default output:

figures/nsd_evaluation/nsd_model_type_comparison_bar_ci95.png
figures/nsd_evaluation/nsd_model_type_comparison_bar_ci95.pdf

Inputs:

native summary CSV
from-fs-curv summary CSV

2) Separate source plots by metric

Script:

nsd_evaluation/plot_nsd_model_type_comparison_separate_sources.py

Default outputs:

figures/nsd_evaluation/nsd_model_type_comparison_native_ci95_by_metric.png
figures/nsd_evaluation/nsd_model_type_comparison_native_ci95_by_metric.pdf
figures/nsd_evaluation/nsd_model_type_comparison_from_fs_curv_ci95_by_metric.png
figures/nsd_evaluation/nsd_model_type_comparison_from_fs_curv_ci95_by_metric.pdf
figures/nsd_evaluation/nsd_model_type_comparison_legend.png
figures/nsd_evaluation/nsd_model_type_comparison_legend.pdf

Important current behavior for this script:

It uses only MODEL_TYPE_ORDER = ["baseline", "transolver_optionC"].
It writes legend separately (no in-plot legend in main figures).

Model Label Notes

Canonical naming in this repository:

baseline -> deepRetinotopy
transolver_optionA -> Retinosolver
transolver_optionC -> Transolver

If a plotting script uses a legacy label mapping, keep it only for strict figure reproducibility and document it in commit/notes.

Validation Checklist

After evaluation or CSV replacement, verify:

Hemisphere row counts are correct in summary CSV:
- lh: 27, rh: 27 (for 3 predictions x 3 models x 3 seeds).
Detailed CSV row counts are correct:
- lh: 216, rh: 216 (8 subjects x 3 predictions x 3 models x 3 seeds).
For RH replacement, RH rows in canonical files numerically match _RH_renewed source.
For inference batches, failed_combinations.txt is empty.

Long-Run Execution

For long inference runs, use screen and log to file.

Example:

screen -dmS nsd_rerun bash -lc 'cd /mnt/scratch/junb/deepRetinotopy && <command> > logs/nsd_rerun.log 2>&1'

Do Not Do

Do not run analysis in a non-masam_blackwell environment.
Do not overwrite canonical CSVs without writing explicit backups.
Do not change delta sign conventions (from_fs_curv - native).

ナビゲーション

Skillsとは？

リンク

NSD Evaluation Reproducibility Guide

NSD Evaluation Reproducibility Guide

Scope

Environment

Canonical Evaluation Settings (EVC, r2 > 10)

Metric Policy

RH Renewed Update Rule

Canonical CSV Files

Comparison CSV Conventions

Plot Reproducibility Rules

1) Combined source comparison bar plot

2) Separate source plots by metric

Model Label Notes

Validation Checklist

Long-Run Execution

Do Not Do

関連スキル(📊 データ・分析)