name: metabolicpathwayheterogeneity description: Analyzes metabolic pathway heterogeneity within cell populations by calculating normalized enrichment scores (NES) for each pathway across different groups. Quantifies metabolic diversity and identifies pathways with variable activity patterns. Uses principal component analysis and GSEA to assess pathway heterogeneity, revealing subpopulation-specific metabolic states and transitions.
MetabolicPathwayHeterogeneity Process Configuration
Purpose
Analyzes metabolic pathway heterogeneity within cell populations by calculating normalized enrichment scores (NES) for each pathway across different groups. Quantifies metabolic diversity and identifies pathways with variable activity patterns. Uses principal component analysis and GSEA to assess pathway heterogeneity, revealing subpopulation-specific metabolic states and transitions.
When to Use
- Assess metabolic diversity: When you need to quantify metabolic heterogeneity within clusters or conditions
- Identify variable pathways: To find which metabolic pathways show high vs low heterogeneity across groups
- Compare metabolic variability: To compare heterogeneity between treatments, timepoints, or cell types
- After pathway activity analysis: Complements MetabolicPathwayActivity by adding heterogeneity dimension
- Final metabolic analysis step: Typically the last process in the ScrnaMetabolicLandscape workflow
- Subpopulation discovery: When metabolic variability suggests hidden substructure
Configuration Structure
Process Enablement
MetabolicPathwayHeterogeneity is part of the ScrnaMetabolicLandscape group. Enable it by enabling the group:
[ScrnaMetabolicLandscape]
cache = true
Input Specification
MetabolicPathwayHeterogeneity receives input automatically from MetabolicInput (or MetabolicExprImputation if imputation enabled):
[ScrnaMetabolicLandscape.in]
srtobj = ["SeuratClustering"] # Input from upstream clustering process
Environment Variables
All configuration is done at the ScrnaMetabolicLandscape group level:
[ScrnaMetabolicLandscape.envs]
# Core configuration (inherited by all metabolic processes)
gmtfile = "KEGG_2021_Human" # Metabolic pathways database
group_by = "seurat_clusters" # Column to group cells (e.g., "cluster")
subset_by = "treatment" # Optional: Subset by metadata column
ncores = 1 # Number of cores for parallelization
MetabolicPathwayHeterogeneity-Specific Configuration
[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs]
# Principal component selection for heterogeneity analysis
select_pcs = 0.8 # Fraction or number of PCs to use (0-1 = fraction, >1 = number)
# Pathway significance filtering
pathway_pval_cutoff = 0.01 # P-value cutoff to select enriched pathways
# Parallelization
ncores = 1 # Cores for parallel processing (inherited from group if not set)
# fgsea parameters
[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs.fgsea_args]
scoreType = "std" # Options: "std", "pos", "neg"
nproc = 1 # fgsea internal parallelization
minSize = 15 # Minimum pathway size
maxSize = 500 # Maximum pathway size
# Plots configuration
[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs.plots]
"Pathway Heterogeneity" = {
plot_type = "dot", # Options: dot, heatmap
devpars = { res = 100 } # Plot resolution
}
# Multiple analysis cases (advanced)
[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs.cases]
"Treatment" = {
subset_by = "treatment",
group_by = "seurat_clusters",
select_pcs = 0.8,
pathway_pval_cutoff = 0.01
}
Heterogeneity Analysis Method
Algorithm Overview
- PCA on pathway scores: Principal component analysis on pathway activity scores per group
- PC selection: Select top PCs explaining variance (controlled by
select_pcs) - GSEA on PCs: Run fgsea for each PC to identify pathways correlating with variance
- NES calculation: Normalized Enrichment Score quantifies pathway-PC association
- Heterogeneity metric: Pathways with high |NES| show high heterogeneity across groups
PC Selection (select_pcs)
| Value | Interpretation | Use Case |
|---|---|---|
| 0.8 (default) | Use PCs explaining 80% variance | Balanced approach |
| 0.5 | Use PCs explaining 50% variance | Focus on major variation sources |
| 0.95 | Use PCs explaining 95% variance | Comprehensive analysis (slower) |
| 5 (integer) | Use exactly 5 PCs | Manual control |
| 10 | Use exactly 10 PCs | Fine-grained heterogeneity |
Recommendation: Start with 0.8 (default), increase to 0.9-0.95 if you suspect hidden heterogeneity.
Pathway P-value Cutoff
pathway_pval_cutoff = 0.01 # Only pathways with p < 0.01 are analyzed
- Purpose: Filter to significantly enriched pathways before heterogeneity analysis
- Lower values (0.001): Strict, only highly significant pathways
- Higher values (0.05): Permissive, include more pathways
- Default (0.01): Good balance for most analyses
FGSEA Score Type
[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs.fgsea_args]
scoreType = "std" # Options: "std", "pos", "neg"
| Score Type | Description | Use Case |
|---|---|---|
| std | Standard GSEA (both directions) | Default; detects heterogeneity in both high/low activity |
| pos | Only positive enrichment | Focus on pathways with increased activity variation |
| neg | Only negative enrichment | Focus on pathways with decreased activity variation |
GMT File Sources
The gmtfile parameter accepts:
- Built-in databases:
"KEGG_2021_Human","Reactome_Pathways_2024","BioCarta_2016","MSigDB_Hallmark_2020" - Custom files: Local paths or URLs to GMT format files
- See
/skills/processes/metabolicinput.mdfor detailed database options
Configuration Examples
Minimal Configuration (Default Settings)
[ScrnaMetabolicLandscape]
[ScrnaMetabolicLandscape.in]
srtobj = ["SeuratClustering"]
[ScrnaMetabolicLandscape.envs]
gmtfile = "KEGG_2021_Human"
group_by = "seurat_clusters"
Comprehensive Heterogeneity Analysis
[ScrnaMetabolicLandscape]
[ScrnaMetabolicLandscape.envs]
gmtfile = "KEGG_2021_Human"
group_by = "seurat_clusters"
ncores = 8
[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs]
select_pcs = 0.95 # Capture 95% of variance
pathway_pval_cutoff = 0.01 # Significant pathways only
ncores = 8
[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs.fgsea_args]
scoreType = "std"
nproc = 8
minSize = 10
maxSize = 500
Treatment Comparison Heterogeneity
[ScrnaMetabolicLandscape]
[ScrnaMetabolicLandscape.envs]
gmtfile = "Reactome_Pathways_2024"
group_by = "seurat_clusters"
subset_by = "treatment" # Compare heterogeneity between treatments
[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs]
select_pcs = 0.8
pathway_pval_cutoff = 0.05 # More permissive for exploratory analysis
[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs.plots]
"Treatment Heterogeneity" = {
plot_type = "dot",
devpars = { width = 1200, height = 800, res = 150 }
}
High-Resolution Publication Plots
[ScrnaMetabolicLandscape]
[ScrnaMetabolicLandscape.envs]
gmtfile = "KEGG_2021_Human"
group_by = "seurat_clusters"
[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs]
select_pcs = 10 # Use exactly 10 PCs
pathway_pval_cutoff = 0.01
[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs.plots]
"Pathway Heterogeneity" = {
plot_type = "dot",
devpars = { width = 1600, height = 1200, res = 300 }
}
Multiple Analysis Cases
[ScrnaMetabolicLandscape]
[ScrnaMetabolicLandscape.envs]
gmtfile = "KEGG_2021_Human"
ncores = 8
# Case 1: Overall cluster heterogeneity
[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs.cases.Clusters]
group_by = "seurat_clusters"
select_pcs = 0.8
pathway_pval_cutoff = 0.01
plots = {
"Cluster Heterogeneity" = { plot_type = "dot", devpars = { res = 150 } }
}
# Case 2: Treatment-specific heterogeneity
[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs.cases.Treatment]
subset_by = "treatment"
group_by = "seurat_clusters"
select_pcs = 0.9
pathway_pval_cutoff = 0.01
plots = {
"Treatment Heterogeneity" = { plot_type = "dot", devpars = { res = 150 } }
}
Conservative Analysis (Major Variance Only)
[ScrnaMetabolicLandscape]
[ScrnaMetabolicLandscape.envs]
gmtfile = "KEGG_2021_Human"
group_by = "seurat_clusters"
[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs]
select_pcs = 0.5 # Only major variance (50%)
pathway_pval_cutoff = 0.001 # Very strict significance
[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs.fgsea_args]
scoreType = "std"
minSize = 20 # Larger pathways only
maxSize = 300
Common Patterns
Pattern 1: Standard Heterogeneity Analysis
Assess metabolic variability across clusters:
[ScrnaMetabolicLandscape]
[ScrnaMetabolicLandscape.envs]
gmtfile = "KEGG_2021_Human"
group_by = "seurat_clusters"
[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs]
select_pcs = 0.8
pathway_pval_cutoff = 0.01
Pattern 2: Treatment Response Variability
Compare heterogeneity between responders and non-responders:
[ScrnaMetabolicLandscape]
[ScrnaMetabolicLandscape.envs]
gmtfile = "Reactome_Pathways_2024"
subset_by = "response" # "responder" vs "nonresponder"
group_by = "seurat_clusters"
[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs]
select_pcs = 0.85
pathway_pval_cutoff = 0.01
Pattern 3: Timepoint Progression
Analyze heterogeneity changes over time:
[ScrnaMetabolicLandscape]
[ScrnaMetabolicLandscape.envs]
gmtfile = "KEGG_2021_Human"
subset_by = "timepoint" # e.g., "day0", "day7", "day14"
group_by = "seurat_clusters"
[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs]
select_pcs = 0.9 # Comprehensive to detect gradual changes
pathway_pval_cutoff = 0.01
Pattern 4: Energy Metabolism Focus
Analyze heterogeneity in glycolysis and OXPHOS:
[ScrnaMetabolicLandscape]
[ScrnaMetabolicLandscape.envs]
gmtfile = "/data/pathways/energy_metabolism.gmt" # Custom GMT
group_by = "seurat_clusters"
[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs]
select_pcs = 0.8
pathway_pval_cutoff = 0.05 # More permissive for focused analysis
[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs.fgsea_args]
minSize = 5 # Allow smaller custom pathways
Pattern 5: High-Throughput Parallel Execution
Large dataset with many groups:
[ScrnaMetabolicLandscape]
[ScrnaMetabolicLandscape.envs]
gmtfile = "KEGG_2021_Human"
group_by = "seurat_clusters"
ncores = 16
[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs]
select_pcs = 0.8
ncores = 16
[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs.fgsea_args]
nproc = 1 # Parallelize at group level, not within fgsea
Dependencies
Upstream Processes
- Required:
MetabolicInput(part of ScrnaMetabolicLandscape group) - Optional:
MetabolicExprImputation(if imputation enabled withnoimpute = false) - Root:
CombinedInput→ requiresSeuratClusteringor similar clustering process
Downstream Processes
- Parallel: Runs alongside
MetabolicPathwayActivityandMetabolicFeatures(same group) - Typically final: Usually the last metabolic analysis step
- Optional: Can feed into visualization or reporting processes
Data Requirements
- Seurat object with normalized expression data
- Metadata column specified in
group_by(e.g., cluster assignments) - Optional metadata column in
subset_byfor subset analysis - GMT file with metabolic pathway gene sets matching Seurat object gene names
- Multiple groups required (>2) for meaningful heterogeneity analysis
Output Format
Output Files
MetabolicPathwayHeterogeneity generates the following outputs in the outdir directory (default: {{in.sobjfile | stem}}.pathwayhetero):
- Heterogeneity scores: TSV files with NES per pathway and PC
- Columns: pathway, PC1_NES, PC1_pval, PC2_NES, PC2_pval, ...
- Dot plots: Visualization of pathway heterogeneity across groups and subsets
- PCA results: PC variance explained, loadings
- Pathway rankings: Pathways ranked by heterogeneity magnitude
Result Interpretation
- High |NES|: Pathway shows high heterogeneity (variable activity across groups)
- Low |NES|: Pathway shows low heterogeneity (uniform activity across groups)
- Positive NES: Pathway correlated with PC (increases along PC axis)
- Negative NES: Pathway anti-correlated with PC (decreases along PC axis)
- P-value: Statistical significance of pathway-PC association
Biological Interpretation
- High heterogeneity pathways: Indicate metabolic subpopulations or transitions
- Low heterogeneity pathways: Indicate core/constitutive metabolism
- Group-specific patterns: Suggest differential metabolic regulation
- PC loadings: Reveal metabolic axes of variation
Validation Rules
Input Validation
gmtfilemust be a valid enrichit database name OR accessible GMT file- Gene names in GMT file must match Seurat object (case-sensitive)
group_bycolumn must exist in Seurat object metadata- If
subset_byspecified, column must exist and NA values will be removed
Parameter Validation
select_pcs: If 0 < value <= 1, interpreted as fraction; if > 1, interpreted as numberpathway_pval_cutoff: Must be between 0 and 1 (typically 0.001-0.1)ncores: Must be positive integer- At least 2 groups required in
group_byfor heterogeneity analysis
FGSEA Validation
scoreTypemust be one of: "std", "pos", "neg"minSize<maxSizeminSize>= 1- At least one pathway must meet size and significance criteria
Troubleshooting
Issue: All heterogeneity scores are zero or very low
Cause: Groups have similar metabolic profiles or insufficient variance Solution:
# Increase PC capture
[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs]
select_pcs = 0.95 # Capture more variance
# Relax pathway significance
pathway_pval_cutoff = 0.05
# Check if groups are truly different
# May need to refine group_by or subset_by columns
Issue: Process too slow
Cause: High select_pcs or insufficient parallelization
Solution:
# Reduce PCs
[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs]
select_pcs = 0.7 # Use fewer PCs
# Increase parallelization
ncores = 8
[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs.fgsea_args]
nproc = 8
Issue: No significant pathways after filtering
Cause: pathway_pval_cutoff too strict or weak biological signal
Solution:
# Relax cutoff
[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs]
pathway_pval_cutoff = 0.05 # Increase from 0.01
# Or reduce pathway size filters
[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs.fgsea_args]
minSize = 10
maxSize = 800
Issue: Memory errors during PCA
Cause: Too many pathways or groups Solution:
# Reduce number of PCs
[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs]
select_pcs = 5 # Use fixed small number
# Or filter pathways more aggressively
pathway_pval_cutoff = 0.001
# Reduce parallelization
ncores = 2
Issue: Results don't make biological sense
Cause: Inappropriate PC selection or wrong grouping variable Solution:
# Try different PC thresholds
select_pcs = 0.8 # Default
# vs
select_pcs = 10 # Fixed number
# Verify grouping variable is biologically meaningful
group_by = "refined_clusters" # More biologically relevant than raw seurat_clusters
# Check if subset_by is creating meaningful comparisons
subset_by = "cell_type" # More specific than "treatment"
Issue: Dot plot unreadable (too many pathways)
Cause: Too many significant pathways Solution:
# Stricter filtering
[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs]
pathway_pval_cutoff = 0.001 # Very strict
# Or use custom GMT with fewer pathways
[ScrnaMetabolicLandscape.envs]
gmtfile = "/data/pathways/core_metabolism.gmt"
# Increase plot size
[ScrnaMetabolicLandscape.MetabolicPathwayHeterogeneity.envs.plots]
"Pathway Heterogeneity" = {
plot_type = "dot",
devpars = { width = 2000, height = 1500, res = 150 }
}
Issue: Gene name mismatch errors
Cause: GMT file gene names don't match Seurat object Solution:
- Check gene format: Human (UPPERCASE), Mouse (TitleCase)
- Verify GMT file format:
name\tdescription\tgene1\tgene2\tgene3 - Ensure gene IDs match (e.g., both ENSEMBL or both symbols)
Issue: Insufficient groups for heterogeneity analysis
Cause: Only 1-2 groups in group_by column
Solution:
# Use different grouping variable with more groups
[ScrnaMetabolicLandscape.envs]
group_by = "seurat_clusters" # Should have >2 clusters
# Or add subset_by to create comparisons
subset_by = "treatment" # Creates multiple group sets
Issue: PCA explains very little variance
Cause: Pathways have little variation across groups Solution:
- Verify upstream pathway activity analysis completed successfully
- Check if groups are truly metabolically distinct
- Try different pathway database (e.g., KEGG → Reactome)
- Consider different
group_byvariable
External References
Original Paper
Metabolic landscape methodology:
- Xiao, Z. et al. (2019). Metabolic landscape of the tumor microenvironment at single cell resolution. Nature Communications, 10, 3763. https://www.nature.com/articles/s41467-019-11738-0
Analytical Methods
PCA (Principal Component Analysis):
- Standard dimensionality reduction technique for identifying axes of variation
- https://en.wikipedia.org/wiki/Principal_component_analysis
GSEA (Gene Set Enrichment Analysis):
- Subramanian, A. et al. (2005). Gene set enrichment analysis. PNAS, 102(43), 15545-15550. https://www.pnas.org/doi/10.1073/pnas.0506580102
fgsea (Fast GSEA):
- Korotkevich, G. et al. (2021). Fast gene set enrichment analysis. bioRxiv. https://www.biorxiv.org/content/10.1101/060012v3
Tool Documentation
- fgsea package: https://rdrr.io/bioc/fgsea/man/fgsea.html
- biopipen VizGSEA: https://pwwang.github.io/biopipen.utils.R/reference/VizGSEA.html
- biopipen metabolic pipeline: https://pwwang.github.io/biopipen/pipelines/scrna_metabolic/
GMT Databases
- MSigDB: http://www.gsea-msigdb.org/gsea/msigdb/
- KEGG: https://www.genome.jp/kegg/pathway.html
- Reactome: https://reactome.org/
- enrichit database list: See
/skills/processes/metabolicinput.md
Related Skills
- ScrnaMetabolicLandscape:
/skills/processes/scrnametaboliclandscape.md- Full metabolic analysis group - MetabolicInput:
/skills/processes/metabolicinput.md- Input preparation and GMT databases - MetabolicPathwayActivity:
/skills/processes/metabolicpathwayactivity.md- Pathway activity scoring (AUCell-based) - MetabolicFeatures:
/skills/processes/metabolicfeatures.md- Pathway enrichment analysis (FGSEA-based)
Decision Tree for Heterogeneity Analysis
Start: MetabolicPathwayHeterogeneity
│
├─ Do groups show obvious metabolic differences?
│ ├─ YES → Use default settings (select_pcs = 0.8)
│ └─ NO → Increase PC capture (select_pcs = 0.95)
│
├─ How many groups in analysis?
│ ├─ 2-5 groups → Use select_pcs = 0.8 (major variance)
│ ├─ 6-10 groups → Use select_pcs = 0.85 (balanced)
│ └─ >10 groups → Use select_pcs = 0.9 (comprehensive)
│
├─ Exploratory vs confirmatory analysis?
│ ├─ Exploratory → pathway_pval_cutoff = 0.05 (permissive)
│ └─ Confirmatory → pathway_pval_cutoff = 0.01 (default)
│
└─ Computational resources available?
├─ Limited → select_pcs = 0.7, ncores = 2
├─ Moderate → select_pcs = 0.8, ncores = 4
└─ Abundant → select_pcs = 0.95, ncores = 16
Interpretation Guide
High Heterogeneity Pathways (|NES| > 2)
- Biological significance: Indicates metabolic subpopulations or transitional states
- Follow-up: Investigate which groups drive heterogeneity (check PC loadings)
- Examples: Glycolysis (Warburg effect heterogeneity), fatty acid oxidation (metabolic flexibility)
Low Heterogeneity Pathways (|NES| < 1)
- Biological significance: Core/constitutive metabolism shared across groups
- Follow-up: May serve as normalizing factors or housekeeping metabolism
- Examples: Basic nucleotide metabolism, core TCA cycle
PC Interpretation
- PC1: Usually captures major metabolic axis (e.g., proliferative vs quiescent)
- PC2-3: Secondary axes (e.g., treatment response, differentiation state)
- Higher PCs: Fine-grained variation or technical noise
Comparing Cases
When using multiple cases (e.g., treatment vs control):
- Similar heterogeneity: Pathway variability intrinsic to cell populations
- Differential heterogeneity: Treatment-induced metabolic plasticity or restriction