name: immunopipe-config description: Master skill for generating immunopipe pipeline configurations. Determines pipeline architecture based on data type (scRNA-seq with or without scTCR/BCR-seq) and analysis requirements. Routes to individual process skills for detailed configuration. Use this skill when starting a new immunopipe configuration or modifying pipeline-level options.
Immunopipe Configuration Generator (Main Skill)
Purpose: Master skill for generating immunopipe pipeline configurations. Routes to individual process skills and determines pipeline architecture based on analysis requirements.
When to Use This Skill
- User wants to create/modify immunopipe configuration files
- Need to determine which processes to enable based on analysis goals
- Need to configure pipeline-level options (name, outdir, forks, scheduler)
- Need routing to specific process configuration skills
Pipeline Architecture Decision Tree
Step 1: Data Type Assessment
Ask the user about their data:
-
Do you have scRNA-seq data?
- If YES → RNA analysis processes needed
- If NO → Cannot proceed (RNA data required)
-
Do you have scTCR-seq or scBCR-seq data?
- If YES → Enable TCR/BCR processes (TCR route)
- If NO → RNA-only analysis (No-TCR route)
-
Is your RNA data already processed in a Seurat object?
- If YES → Use
LoadingRNAFromSeuratinstead ofSampleInfo+SeuratPreparing - If NO → Use standard input via
SampleInfo
- If YES → Use
Step 2: Analysis Goals
Ask what analyses they want to perform:
| Goal | Required Processes | Routing |
|---|---|---|
| Basic clustering & visualization | SampleInfo, SeuratPreparing, SeuratClustering, SeuratClusterStats | Use sampleinfo, seuratpreparing, seuratclustering, seuratclusterstats skills |
| T/B cell selection | Add TOrBCellSelection | Use torbcellselection skill |
| Cell type annotation | Add CellTypeAnnotation or SeuratMap2Ref | Use celltypeannotation or seuratmap2ref skills |
| Marker finding | Add ClusterMarkers or MarkersFinder | Use clustermarkers or markersfinder skills |
| TCR clonotype analysis | Add CDR3Clustering, TESSA, ClonalStats | Use cdr3clustering, tessa, clonalstats skills |
| Cell-cell communication | Add CellCellCommunication | Use cellcellcommunication skill |
| Pathway enrichment | Add ScFGSEA | Use scfgsea skill |
| Metabolic analysis | Add ScrnaMetabolicLandscape | Use scrnametaboliclandscape skill |
| Differential expression | Add PseudoBulkDEG | Use pseudobulkdeg skill |
Step 3: Essential vs Optional Processes
Essential Processes (always needed for TCR route):
SampleInfo(orLoadingRNAFromSeurat)ScRepLoading(if TCR/BCR data present)SeuratPreparing(unless loading from prepared Seurat object)SeuratClusteringSeuratClusterStats
Essential Processes (RNA-only route):
SampleInfo(orLoadingRNAFromSeurat)SeuratPreparingSeuratClusteringSeuratClusterStats
Optional Processes (enable only if requested):
TOrBCellSelection- T/B cell separationSeuratClusteringOfAllCells- Clustering before T/B selectionClusterMarkersOfAllCells- Markers before T/B selectionTopExpressingGenesOfAllCells- Top genes before T/B selectionCellTypeAnnotation- Automated cell type annotationSeuratMap2Ref- Reference-based annotationSeuratSubClustering- Sub-clustering analysisClusterMarkers- Differential expression between clustersTopExpressingGenes- Top expressed genes per clusterMarkersFinder- Flexible marker findingModuleScoreCalculator- Module/pathway scoringScRepCombiningExpression- TCR + RNA integrationCDR3Clustering- TCR CDR3 clusteringTESSA- TCR-specific analysisCDR3AAPhyschem- CDR3 physicochemical propertiesClonalStats- Clonality statisticsCellCellCommunication- Ligand-receptor analysisCellCellCommunicationPlots- Communication plotsScFGSEA- Fast gene set enrichmentPseudoBulkDEG- Pseudo-bulk differential expressionScrnaMetabolicLandscape- Comprehensive metabolic analysis
Pipeline-Level Configuration
Basic Pipeline Options
name = "my_pipeline" # Pipeline name (affects workdir and outdir)
outdir = "./output" # Output directory (default: ./<name>-output)
loglevel = "info" # Logging level: debug, info, warning, error
forks = 4 # Number of parallel jobs (adjust based on CPU cores)
cache = true # Enable caching (recommended)
error_strategy = "halt" # halt, ignore, or retry
num_retries = 3 # Number of retries if error_strategy = "retry"
Scheduler Configuration
Local execution (default):
scheduler = "local"
SLURM cluster:
scheduler = "slurm"
[scheduler_opts]
qsub_opts = "-p general -q general -N {job.name} -t {job.index}"
SGE cluster:
scheduler = "sge"
[scheduler_opts]
qsub_opts = "-V -cwd -j yes"
Google Cloud Batch:
# Use: immunopipe gbatch instead of immunopipe
# See gbatch skill for configuration
Plugin Options
[plugin_opts.report]
filters = ["name:Filter"] # Filter processes in report
[plugin_opts.runinfo]
# Runinfo plugin enabled by default
Routing to Process Skills
When user needs specific process configuration, route to the appropriate skill:
Core Input Processes
- SampleInfo: Use
sampleinfoskill - LoadingRNAFromSeurat: Use
loadingrnafromseuratskill - ScRepLoading: Use
screploadingskill
Preprocessing Processes
- SeuratPreparing: Use
seuratpreparingskill
Clustering Processes
- SeuratClustering: Use
seuratclusteringskill - SeuratClusteringOfAllCells: Use
seuratclusteringofallcellsskill - SeuratSubClustering: Use
seuratsubclusteringskill
Cell Selection
- TOrBCellSelection: Use
torbcellselectionskill
Annotation Processes
- CellTypeAnnotation: Use
celltypeannotationskill - SeuratMap2Ref: Use
seuratmap2refskill
Marker Analysis
- ClusterMarkers: Use
clustermarkersskill - ClusterMarkersOfAllCells: Use
clustermarkersofallcellsskill - MarkersFinder: Use
markersfinderskill - TopExpressingGenes: Use
topexpressinggenesskill - TopExpressingGenesOfAllCells: Use
topexpressinggenesofallcellsskill
TCR/BCR Analysis
- ScRepCombiningExpression: Use
screpcombiningexpressionskill - CDR3Clustering: Use
cdr3clusteringskill - TESSA: Use
tessaskill - CDR3AAPhyschem: Use
cdr3aaphyschemskill - ClonalStats: Use
clonalstatsskill
Downstream Analysis
- ModuleScoreCalculator: Use
modulescorecalculatorskill - CellCellCommunication: Use
cellcellcommunicationskill - CellCellCommunicationPlots: Use
cellcellcommunicationplotsskill - SeuratClusterStats: Use
seuratclusterstatsskill - ScFGSEA: Use
scfgseaskill - PseudoBulkDEG: Use
pseudobulkdegskill
Metabolic Analysis
- ScrnaMetabolicLandscape: Use
scrnametaboliclandscapeskill
Configuration File Structure
A complete TOML configuration file has three sections:
# 1. PIPELINE-LEVEL OPTIONS
name = "my_pipeline"
outdir = "./output"
forks = 4
# 2. PROCESS-LEVEL OPTIONS
[ProcessName]
cache = true
forks = 2 # Override pipeline-level forks for this process
[ProcessName.in]
# Input files specification
[ProcessName.envs]
# Environment variables (process parameters)
# 3. GOOGLE BATCH OPTIONS (if using immunopipe gbatch)
[cli-gbatch]
project = "my-gcp-project"
region = "us-central1"
Example Workflows
Example 1: Basic TCR Analysis
User request: "I have scRNA-seq and scTCR-seq data. I want basic analysis with T cell selection."
Response:
- Enable essential TCR processes:
SampleInfo,ScRepLoading,SeuratPreparing,SeuratClustering,SeuratClusterStats - Enable T cell selection:
SeuratClusteringOfAllCells,TOrBCellSelection - Route to
sampleinfoskill to configure input files - Route to each process skill for configuration
Minimal config:
name = "tcr_analysis"
forks = 4
[SampleInfo.in]
infile = ["sample_info.txt"]
[SeuratClusteringOfAllCells]
[TOrBCellSelection]
Example 2: Advanced RNA-only Analysis
User request: "RNA-only data. I need clustering, cell type annotation, marker finding, and pathway enrichment."
Response:
- Enable essential RNA processes:
SampleInfo,SeuratPreparing,SeuratClustering,SeuratClusterStats - Add requested analyses:
CellTypeAnnotation,ClusterMarkers,ScFGSEA - Route to individual skills for configuration
Example 3: Loading from Prepared Seurat Object
User request: "I already have a processed Seurat object. I want to run TCR analysis."
Response:
- Use
LoadingRNAFromSeuratinstead ofSampleInfo+SeuratPreparing - Enable TCR processes:
ScRepLoading,SeuratClustering, etc. - Set
prepared = trueinLoadingRNAFromSeuratto skip preprocessing
Important Notes
Process Dependencies
Some processes have dependencies:
ScRepCombiningExpressionrequires bothScRepLoadingand RNA inputClusterMarkersrequiresSeuratClusteringTOrBCellSelectionusually followsSeuratClusteringOfAllCellsCellCellCommunicationrequires clustering to be complete
Mutually Exclusive Options
- Use EITHER
SampleInfoORLoadingRNAFromSeuratas entry point (not both) - If using
TOrBCellSelection, typically enableSeuratClusteringOfAllCellsfirst CellTypeAnnotationandSeuratMap2Refserve similar purposes (can use both, but one usually sufficient)
Cache Strategy
- Set
cache = "force"at pipeline level to reuse all previous results - Set
cache = falsefor specific process to force re-run - Useful when tweaking visualization parameters without re-running analysis
Configuration Validation
After generating configuration, validate with:
python -m immunopipe.validate_config config.toml
External References
When process options reference external packages, expand them:
Seurat Functions
- When seeing
Seurat::FunctionName, check: https://satijalab.org/seurat/reference/ - Common functions:
FindMarkers(),FindClusters(),SCTransform(),RunUMAP()
Plotthis Functions
- Plot types map to functions:
bar→BarPlot,box→BoxPlot - Full reference: https://pwwang.github.io/plotthis/reference/
DESeq2 Design
- For
PseudoBulkDEG, design formulas use DESeq2 syntax - Reference: https://bioconductor.org/packages/release/bioc/html/DESeq2.html
GSEA Databases
- For
ScFGSEA, GMT files from MSigDB - Reference: https://www.gsea-msigdb.org/gsea/msigdb/
CellChat Database
- For
CellCellCommunication, CellChat databases - Reference: http://www.cellchat.org/
Workflow Summary
- Assess data type (RNA-only vs TCR/BCR)
- Determine analysis goals (clustering, annotation, TCR analysis, etc.)
- Select essential processes based on data type
- Add optional processes based on goals
- Configure pipeline-level options (name, forks, scheduler)
- Route to individual process skills for detailed configuration
- Generate complete TOML file
- Validate configuration before running
Quick Start Templates
For quick starts, use these templates:
- Basic TCR:
basic-tcrtemplate skill - Basic RNA-only:
basic-rnatemplate skill - Advanced TCR:
advanced-tcrtemplate skill - Metabolic analysis:
metabolictemplate skill - Cell communication:
communicationtemplate skill
Error Prevention
Common configuration errors to avoid:
- Missing input specification: Always set
[ProcessName.in]for entry processes - TCR data without ScRepLoading: If TCRData/BCRData columns exist, enable
ScRepLoading - Contradictory process enablement: Don't enable both "OfAllCells" and regular versions without
TOrBCellSelection - Invalid gene names: Use human gene symbols (uppercase) or mouse (title case)
- Path issues: Use absolute paths or paths relative to config file location
- Resource limits: Set appropriate
forksbased on available CPU/memory
Next Steps
After generating config:
- Save to
.tomlfile (e.g.,config.toml) - Run:
immunopipe config.toml - Or use web UI:
pipen board @config.toml - Or use Google Batch:
immunopipe gbatch config.toml
For modifications, route to specific process skills based on what needs to change.