name: immunopipe-config description: Master skill for generating immunopipe pipeline configurations. Determines pipeline architecture based on data type (scRNA-seq with or without scTCR/BCR-seq) and analysis requirements. Routes to individual process skills for detailed configuration. Use this skill when starting a new immunopipe configuration or modifying pipeline-level options.

Immunopipe Configuration Generator (Main Skill)

Purpose: Master skill for generating immunopipe pipeline configurations. Routes to individual process skills and determines pipeline architecture based on analysis requirements.

When to Use This Skill

User wants to create/modify immunopipe configuration files
Need to determine which processes to enable based on analysis goals
Need to configure pipeline-level options (name, outdir, forks, scheduler)
Need routing to specific process configuration skills

Pipeline Architecture Decision Tree

Step 1: Data Type Assessment

Ask the user about their data:

Do you have scRNA-seq data?
- If YES → RNA analysis processes needed
- If NO → Cannot proceed (RNA data required)
Do you have scTCR-seq or scBCR-seq data?
- If YES → Enable TCR/BCR processes (TCR route)
- If NO → RNA-only analysis (No-TCR route)
Is your RNA data already processed in a Seurat object?
- If YES → Use LoadingRNAFromSeurat instead of SampleInfo + SeuratPreparing
- If NO → Use standard input via SampleInfo

Step 2: Analysis Goals

Ask what analyses they want to perform:

Goal	Required Processes	Routing
Basic clustering & visualization	`SampleInfo`, `SeuratPreparing`, `SeuratClustering`, `SeuratClusterStats`	Use `sampleinfo`, `seuratpreparing`, `seuratclustering`, `seuratclusterstats` skills
T/B cell selection	Add `TOrBCellSelection`	Use `torbcellselection` skill
Cell type annotation	Add `CellTypeAnnotation` or `SeuratMap2Ref`	Use `celltypeannotation` or `seuratmap2ref` skills
Marker finding	Add `ClusterMarkers` or `MarkersFinder`	Use `clustermarkers` or `markersfinder` skills
TCR clonotype analysis	Add `CDR3Clustering`, `TESSA`, `ClonalStats`	Use `cdr3clustering`, `tessa`, `clonalstats` skills
Cell-cell communication	Add `CellCellCommunication`	Use `cellcellcommunication` skill
Pathway enrichment	Add `ScFGSEA`	Use `scfgsea` skill
Metabolic analysis	Add `ScrnaMetabolicLandscape`	Use `scrnametaboliclandscape` skill
Differential expression	Add `PseudoBulkDEG`	Use `pseudobulkdeg` skill

Step 3: Essential vs Optional Processes

Essential Processes (always needed for TCR route):

SampleInfo (or LoadingRNAFromSeurat)
ScRepLoading (if TCR/BCR data present)
SeuratPreparing (unless loading from prepared Seurat object)
SeuratClustering
SeuratClusterStats

Essential Processes (RNA-only route):

SampleInfo (or LoadingRNAFromSeurat)
SeuratPreparing
SeuratClustering
SeuratClusterStats

Optional Processes (enable only if requested):

TOrBCellSelection - T/B cell separation
SeuratClusteringOfAllCells - Clustering before T/B selection
ClusterMarkersOfAllCells - Markers before T/B selection
TopExpressingGenesOfAllCells - Top genes before T/B selection
CellTypeAnnotation - Automated cell type annotation
SeuratMap2Ref - Reference-based annotation
SeuratSubClustering - Sub-clustering analysis
ClusterMarkers - Differential expression between clusters
TopExpressingGenes - Top expressed genes per cluster
MarkersFinder - Flexible marker finding
ModuleScoreCalculator - Module/pathway scoring
ScRepCombiningExpression - TCR + RNA integration
CDR3Clustering - TCR CDR3 clustering
TESSA - TCR-specific analysis
CDR3AAPhyschem - CDR3 physicochemical properties
ClonalStats - Clonality statistics
CellCellCommunication - Ligand-receptor analysis
CellCellCommunicationPlots - Communication plots
ScFGSEA - Fast gene set enrichment
PseudoBulkDEG - Pseudo-bulk differential expression
ScrnaMetabolicLandscape - Comprehensive metabolic analysis

Pipeline-Level Configuration

Basic Pipeline Options

name = "my_pipeline"           # Pipeline name (affects workdir and outdir)
outdir = "./output"            # Output directory (default: ./<name>-output)
loglevel = "info"              # Logging level: debug, info, warning, error
forks = 4                      # Number of parallel jobs (adjust based on CPU cores)
cache = true                   # Enable caching (recommended)
error_strategy = "halt"        # halt, ignore, or retry
num_retries = 3                # Number of retries if error_strategy = "retry"

Scheduler Configuration

Local execution (default):

scheduler = "local"

SLURM cluster:

scheduler = "slurm"

[scheduler_opts]
qsub_opts = "-p general -q general -N {job.name} -t {job.index}"

SGE cluster:

scheduler = "sge"

[scheduler_opts]
qsub_opts = "-V -cwd -j yes"

Google Cloud Batch:

# Use: immunopipe gbatch instead of immunopipe
# See gbatch skill for configuration

Plugin Options

[plugin_opts.report]
filters = ["name:Filter"]  # Filter processes in report

[plugin_opts.runinfo]
# Runinfo plugin enabled by default

Routing to Process Skills

When user needs specific process configuration, route to the appropriate skill:

Core Input Processes

SampleInfo: Use sampleinfo skill
LoadingRNAFromSeurat: Use loadingrnafromseurat skill
ScRepLoading: Use screploading skill

Preprocessing Processes

SeuratPreparing: Use seuratpreparing skill

Clustering Processes

SeuratClustering: Use seuratclustering skill
SeuratClusteringOfAllCells: Use seuratclusteringofallcells skill
SeuratSubClustering: Use seuratsubclustering skill

Cell Selection

TOrBCellSelection: Use torbcellselection skill

Annotation Processes

CellTypeAnnotation: Use celltypeannotation skill
SeuratMap2Ref: Use seuratmap2ref skill

Marker Analysis

ClusterMarkers: Use clustermarkers skill
ClusterMarkersOfAllCells: Use clustermarkersofallcells skill
MarkersFinder: Use markersfinder skill
TopExpressingGenes: Use topexpressinggenes skill
TopExpressingGenesOfAllCells: Use topexpressinggenesofallcells skill

TCR/BCR Analysis

ScRepCombiningExpression: Use screpcombiningexpression skill
CDR3Clustering: Use cdr3clustering skill
TESSA: Use tessa skill
CDR3AAPhyschem: Use cdr3aaphyschem skill
ClonalStats: Use clonalstats skill

Downstream Analysis

ModuleScoreCalculator: Use modulescorecalculator skill
CellCellCommunication: Use cellcellcommunication skill
CellCellCommunicationPlots: Use cellcellcommunicationplots skill
SeuratClusterStats: Use seuratclusterstats skill
ScFGSEA: Use scfgsea skill
PseudoBulkDEG: Use pseudobulkdeg skill

Metabolic Analysis

ScrnaMetabolicLandscape: Use scrnametaboliclandscape skill

Configuration File Structure

A complete TOML configuration file has three sections:

# 1. PIPELINE-LEVEL OPTIONS
name = "my_pipeline"
outdir = "./output"
forks = 4

# 2. PROCESS-LEVEL OPTIONS
[ProcessName]
cache = true
forks = 2  # Override pipeline-level forks for this process

[ProcessName.in]
# Input files specification

[ProcessName.envs]
# Environment variables (process parameters)

# 3. GOOGLE BATCH OPTIONS (if using immunopipe gbatch)
[cli-gbatch]
project = "my-gcp-project"
region = "us-central1"

Example Workflows

Example 1: Basic TCR Analysis

User request: "I have scRNA-seq and scTCR-seq data. I want basic analysis with T cell selection."

Response:

Enable essential TCR processes: SampleInfo, ScRepLoading, SeuratPreparing, SeuratClustering, SeuratClusterStats
Enable T cell selection: SeuratClusteringOfAllCells, TOrBCellSelection
Route to sampleinfo skill to configure input files
Route to each process skill for configuration

Minimal config:

name = "tcr_analysis"
forks = 4

[SampleInfo.in]
infile = ["sample_info.txt"]

[SeuratClusteringOfAllCells]
[TOrBCellSelection]

Example 2: Advanced RNA-only Analysis

User request: "RNA-only data. I need clustering, cell type annotation, marker finding, and pathway enrichment."

Response:

Enable essential RNA processes: SampleInfo, SeuratPreparing, SeuratClustering, SeuratClusterStats
Add requested analyses: CellTypeAnnotation, ClusterMarkers, ScFGSEA
Route to individual skills for configuration

Example 3: Loading from Prepared Seurat Object

User request: "I already have a processed Seurat object. I want to run TCR analysis."

Response:

Use LoadingRNAFromSeurat instead of SampleInfo + SeuratPreparing
Enable TCR processes: ScRepLoading, SeuratClustering, etc.
Set prepared = true in LoadingRNAFromSeurat to skip preprocessing

Important Notes

Process Dependencies

Some processes have dependencies:

ScRepCombiningExpression requires both ScRepLoading and RNA input
ClusterMarkers requires SeuratClustering
TOrBCellSelection usually follows SeuratClusteringOfAllCells
CellCellCommunication requires clustering to be complete

Mutually Exclusive Options

Use EITHER SampleInfo OR LoadingRNAFromSeurat as entry point (not both)
If using TOrBCellSelection, typically enable SeuratClusteringOfAllCells first
CellTypeAnnotation and SeuratMap2Ref serve similar purposes (can use both, but one usually sufficient)

Cache Strategy

Set cache = "force" at pipeline level to reuse all previous results
Set cache = false for specific process to force re-run
Useful when tweaking visualization parameters without re-running analysis

Configuration Validation

After generating configuration, validate with:

python -m immunopipe.validate_config config.toml

External References

When process options reference external packages, expand them:

Seurat Functions

When seeing Seurat::FunctionName, check: https://satijalab.org/seurat/reference/
Common functions: FindMarkers(), FindClusters(), SCTransform(), RunUMAP()

Plotthis Functions

Plot types map to functions: bar → BarPlot, box → BoxPlot
Full reference: https://pwwang.github.io/plotthis/reference/

DESeq2 Design

For PseudoBulkDEG, design formulas use DESeq2 syntax
Reference: https://bioconductor.org/packages/release/bioc/html/DESeq2.html

GSEA Databases

For ScFGSEA, GMT files from MSigDB
Reference: https://www.gsea-msigdb.org/gsea/msigdb/

CellChat Database

For CellCellCommunication, CellChat databases
Reference: http://www.cellchat.org/

Workflow Summary

Assess data type (RNA-only vs TCR/BCR)
Determine analysis goals (clustering, annotation, TCR analysis, etc.)
Select essential processes based on data type
Add optional processes based on goals
Configure pipeline-level options (name, forks, scheduler)
Route to individual process skills for detailed configuration
Generate complete TOML file
Validate configuration before running

Quick Start Templates

For quick starts, use these templates:

Basic TCR: basic-tcr template skill
Basic RNA-only: basic-rna template skill
Advanced TCR: advanced-tcr template skill
Metabolic analysis: metabolic template skill
Cell communication: communication template skill

Error Prevention

Common configuration errors to avoid:

Missing input specification: Always set [ProcessName.in] for entry processes
TCR data without ScRepLoading: If TCRData/BCRData columns exist, enable ScRepLoading
Contradictory process enablement: Don't enable both "OfAllCells" and regular versions without TOrBCellSelection
Invalid gene names: Use human gene symbols (uppercase) or mouse (title case)
Path issues: Use absolute paths or paths relative to config file location
Resource limits: Set appropriate forks based on available CPU/memory

Next Steps

After generating config:

Save to .toml file (e.g., config.toml)
Run: immunopipe config.toml
Or use web UI: pipen board @config.toml
Or use Google Batch: immunopipe gbatch config.toml

For modifications, route to specific process skills based on what needs to change.

ナビゲーション

Skillsとは？

リンク

immunopipe-config

Immunopipe Configuration Generator (Main Skill)

When to Use This Skill

Pipeline Architecture Decision Tree

Step 1: Data Type Assessment

Step 2: Analysis Goals

Step 3: Essential vs Optional Processes

Pipeline-Level Configuration

Basic Pipeline Options

Scheduler Configuration

Plugin Options

Routing to Process Skills

Core Input Processes

Preprocessing Processes

Clustering Processes

Cell Selection

Annotation Processes

Marker Analysis

TCR/BCR Analysis

Downstream Analysis

Metabolic Analysis

Configuration File Structure

Example Workflows

Example 1: Basic TCR Analysis

Example 2: Advanced RNA-only Analysis

Example 3: Loading from Prepared Seurat Object

Important Notes

Process Dependencies

Mutually Exclusive Options

Cache Strategy

Configuration Validation

External References

Seurat Functions

Plotthis Functions

DESeq2 Design

GSEA Databases

CellChat Database

Workflow Summary

Quick Start Templates

Error Prevention

Next Steps

関連スキル(📊 データ・分析)