NSD Evaluation Reproducibility Guide
This file defines the reproducible analysis and plotting rules for the nsd_evaluation/ workflow.
Scope
- Directory:
nsd_evaluation/ - Inputs: NSD native predictions and labels.
- Outputs:
- metric CSV files in
nsd_evaluation/results/ - visualization figures in
figures/nsd_evaluation/
- metric CSV files in
Environment
- Always run Python with the
masam_blackwellenvironment. - Preferred command style:
conda run -n masam_blackwell python <script>.py ...
Canonical Evaluation Settings (EVC, r2 > 10)
Use these settings unless explicitly changed.
- ROI definition: EVC V1/V2/V3 ventral+dorsal (ROI labels
1,2,3,4,5,6). - R2 threshold:
--r2_threshold 10 --r2_comparison gt. - Eccentricity additional mask:
--eccentricity_gt_max 12. - Predictions:
eccentricity,polarAngle,pRFsize. - Model types:
baseline,transolver_optionA,transolver_optionC. - Seeds:
0 1 2.
Reference command:
conda run -n masam_blackwell python nsd_evaluation/evaluate_nsd_native_predictions_batch.py \
--nsd_dir /mnt/storage/junb/natural-scenes-dataset/nsddata/freesurfer \
--prediction_root <PREDICTION_ROOT> \
--label_root /mnt/storage/junb/natural-scenes-dataset/nsddata/freesurfer \
--hemispheres lh rh \
--predictions eccentricity polarAngle pRFsize \
--model_types baseline transolver_optionA transolver_optionC \
--seeds 0 1 2 \
--r2_threshold 10 \
--r2_comparison gt \
--eccentricity_gt_max 12 \
--roi_mode evc_v1_v2_v3 \
--prfangle_plot_dir figures/nsd_evaluation/prfangle_vertex_similarity \
--output_csv nsd_evaluation/results/nsd_native_prediction_metrics_evc_r2gt10.csv \
--summary_csv nsd_evaluation/results/nsd_native_prediction_metrics_summary_evc_r2gt10.csv
Metric Policy
polarAngle: usecircular_correlation_mean(astropy circular correlation) as primary.eccentricity: use bothpearson_correlation_meanandspearman_correlation_mean.pRFsize: use bothpearson_correlation_meanandspearman_correlation_mean.- For comparability, keep
pearson_correlation_meanforpolarAngleas auxiliary. - For NSD model-performance comparison plots, use:
polarAngle->circular_correlation_mean,eccentricity->correlation_mean,pRFsize->spearman_correlation_mean.
RH Renewed Update Rule
When renewed RH weights are applied and evaluated:
- Write renewed outputs with
_RH_renewedsuffix, for example:nsd_from_fs_curv_native_prediction_metrics_evc_r2gt10_RH_renewed.csvnsd_from_fs_curv_native_prediction_metrics_summary_evc_r2gt10_RH_renewed.csv
- Compare against old RH results with
_RH_renewed_vs_oldsuffix:- per-subject comparison CSV
- summary comparison CSV
- Before replacing canonical files, create backups:
*_before_RH_renew.csv
- Replace only RH rows in canonical files; keep LH rows unchanged.
Canonical CSV Files
Main from-fs-curv evaluation files:
nsd_evaluation/results/nsd_from_fs_curv_native_prediction_metrics_evc_r2gt10.csvnsd_evaluation/results/nsd_from_fs_curv_native_prediction_metrics_summary_evc_r2gt10.csv
Native baseline files:
nsd_evaluation/results/nsd_native_prediction_metrics_evc_r2gt10.csvnsd_evaluation/results/nsd_native_prediction_metrics_summary_evc_r2gt10.csv
Comparison CSV Conventions
For side-by-side native vs from-fs-curv comparisons:
- Per-seed, per-setting, per-hemisphere:
nsd_evaluation/results/nsd_from_fs_curv_vs_native_side_by_side_by_setting_hemisphere.csv
- Seed-aggregated (mean/std/CI95) per setting + hemisphere:
nsd_evaluation/results/nsd_from_fs_curv_vs_native_side_by_setting_hemisphere_seedavg.csv
delta_* columns must always be:
from_fs_curv - native
Plot Reproducibility Rules
1) Combined source comparison bar plot
Script:
nsd_evaluation/plot_nsd_model_type_comparison_bar.py
Default output:
figures/nsd_evaluation/nsd_model_type_comparison_bar_ci95.pngfigures/nsd_evaluation/nsd_model_type_comparison_bar_ci95.pdf
Inputs:
- native summary CSV
- from-fs-curv summary CSV
2) Separate source plots by metric
Script:
nsd_evaluation/plot_nsd_model_type_comparison_separate_sources.py
Default outputs:
figures/nsd_evaluation/nsd_model_type_comparison_native_ci95_by_metric.pngfigures/nsd_evaluation/nsd_model_type_comparison_native_ci95_by_metric.pdffigures/nsd_evaluation/nsd_model_type_comparison_from_fs_curv_ci95_by_metric.pngfigures/nsd_evaluation/nsd_model_type_comparison_from_fs_curv_ci95_by_metric.pdffigures/nsd_evaluation/nsd_model_type_comparison_legend.pngfigures/nsd_evaluation/nsd_model_type_comparison_legend.pdf
Important current behavior for this script:
- It uses only
MODEL_TYPE_ORDER = ["baseline", "transolver_optionC"]. - It writes legend separately (no in-plot legend in main figures).
Model Label Notes
Canonical naming in this repository:
baseline->deepRetinotopytransolver_optionA->Retinosolvertransolver_optionC->Transolver
If a plotting script uses a legacy label mapping, keep it only for strict figure reproducibility and document it in commit/notes.
Validation Checklist
After evaluation or CSV replacement, verify:
- Hemisphere row counts are correct in summary CSV:
lh: 27,rh: 27(for 3 predictions x 3 models x 3 seeds).
- Detailed CSV row counts are correct:
lh: 216,rh: 216(8 subjects x 3 predictions x 3 models x 3 seeds).
- For RH replacement, RH rows in canonical files numerically match
_RH_renewedsource. - For inference batches,
failed_combinations.txtis empty.
Long-Run Execution
For long inference runs, use screen and log to file.
Example:
screen -dmS nsd_rerun bash -lc 'cd /mnt/scratch/junb/deepRetinotopy && <command> > logs/nsd_rerun.log 2>&1'
Do Not Do
- Do not run analysis in a non-
masam_blackwellenvironment. - Do not overwrite canonical CSVs without writing explicit backups.
- Do not change delta sign conventions (
from_fs_curv - native).