name: cobrapy-metabolic-modeling description: "Constraint-based (COBRA) analysis of genome-scale metabolic models: FBA, FVA, knockouts, flux sampling, production envelopes, gapfilling, media optimization. Use for strain design, essential gene ID, flux analysis. For kinetic modeling use tellurium; for visualization use Escher." license: GPL-2.0
COBRApy — Constraint-Based Metabolic Modeling
Overview
COBRApy is a Python package for constraint-based reconstruction and analysis (COBRA) of genome-scale metabolic models. It provides flux balance analysis (FBA), flux variability analysis (FVA), gene and reaction knockout screens, flux sampling, production envelopes, gapfilling, and media optimization on SBML-format metabolic networks.
When to Use
- Predicting microbial growth rates under different nutrient conditions (FBA)
- Identifying essential genes or reactions via single and double knockout screens
- Determining flux ranges and alternative optimal solutions (FVA)
- Sampling feasible flux distributions to characterize metabolic flexibility
- Designing minimal growth media or optimizing carbon sources
- Computing production envelopes for metabolic engineering targets
- Gapfilling incomplete draft models using a universal reaction database
- For kinetic modeling or dynamic ODE-based models, use Tellurium instead
- For pathway visualization on metabolic maps, use Escher instead
Prerequisites
- Python packages:
cobra(includes GLPK solver),numpy,pandas - Optional solvers: CPLEX or Gurobi (faster for large models, require license)
- Data: SBML (.xml), JSON, or YAML metabolic model files; available from BiGG Models, AGORA, or ModelSEED
pip install cobra
Quick Start
from cobra.io import load_model
model = load_model("textbook") # E. coli core model
print(f"Model: {model.id} — {len(model.reactions)} rxns, {len(model.metabolites)} mets, {len(model.genes)} genes")
solution = model.optimize()
print(f"Growth rate: {solution.objective_value:.4f} /h")
print(f"Status: {solution.status}")
# Model: e_coli_core — 95 rxns, 72 mets, 137 genes
# Growth rate: 0.8739 /h
# Status: optimal
Core API
1. Model I/O
Load bundled models and read/write standard formats.
from cobra.io import load_model, read_sbml_model, write_sbml_model, load_json_model, save_json_model
# Bundled: "textbook" (95 rxns), "ecoli" (2583 rxns), "salmonella"
model = load_model("textbook")
# model = read_sbml_model("my_model.xml") # from SBML file
# model = load_json_model("my_model.json") # from JSON file
write_sbml_model(model, "output_model.xml")
save_json_model(model, "output_model.json")
print(f"Saved model: {model.id}")
2. Model Structure and Components
Access reactions, metabolites, and genes via DictList containers.
from cobra.io import load_model
model = load_model("textbook")
# Inspect a reaction
rxn = model.reactions.get_by_id("PFK")
print(f"Reaction: {rxn.id} — {rxn.name}")
print(f"Equation: {rxn.reaction}")
print(f"Bounds: {rxn.bounds}, GPR: {rxn.gene_reaction_rule}")
# Inspect a metabolite
met = model.metabolites.get_by_id("atp_c")
print(f"Metabolite: {met.id}, Formula: {met.formula}, Compartment: {met.compartment}")
# Query and list exchange reactions
atp_rxns = model.reactions.query("atp", attribute="name")
print(f"ATP-related reactions: {len(atp_rxns)}, Exchange reactions: {len(model.exchanges)}")
3. Flux Balance Analysis (FBA)
Predict optimal flux distributions by maximizing an objective.
from cobra.io import load_model
from cobra.flux_analysis import pfba
model = load_model("textbook")
# Standard FBA
solution = model.optimize()
print(f"Growth: {solution.objective_value:.4f} /h, Active fluxes: {(solution.fluxes.abs() > 1e-6).sum()}")
# Parsimonious FBA — same growth, minimal total flux
pfba_sol = pfba(model)
print(f"pFBA total flux: {pfba_sol.fluxes.abs().sum():.1f} vs standard: {solution.fluxes.abs().sum():.1f}")
# Change objective; slim_optimize for speed
from cobra.io import load_model
model = load_model("textbook")
with model:
model.objective = "ATPM"
print(f"Max ATPM flux: {model.optimize().objective_value:.2f}")
print(f"Growth (slim): {model.slim_optimize():.4f}") # no flux vector, faster
4. Flux Variability Analysis (FVA)
Determine feasible flux ranges at or near optimality.
from cobra.io import load_model
from cobra.flux_analysis import flux_variability_analysis
model = load_model("textbook")
fva = flux_variability_analysis(model, fraction_of_optimum=1.0)
fva_90 = flux_variability_analysis(model, fraction_of_optimum=0.9)
fva["range"] = fva["maximum"] - fva["minimum"]
fva_90["range"] = fva_90["maximum"] - fva_90["minimum"]
print(f"Mean range at 100%: {fva['range'].mean():.2f}, at 90%: {fva_90['range'].mean():.2f}")
# Loopless FVA on specific reactions
from cobra.io import load_model
from cobra.flux_analysis import flux_variability_analysis
model = load_model("textbook")
fva_ll = flux_variability_analysis(
model, loopless=True, reaction_list=["PFK", "PGI", "FBA", "TPI", "GAPD"],
)
print(fva_ll)
5. Gene and Reaction Deletions
Screen for essential genes/reactions via knockout simulations.
from cobra.io import load_model
from cobra.flux_analysis import single_gene_deletion, double_gene_deletion
model = load_model("textbook")
wt_growth = model.slim_optimize()
# Single gene deletions
gene_results = single_gene_deletion(model)
gene_results["growth_fraction"] = gene_results["growth"] / wt_growth
essential = gene_results[gene_results["growth_fraction"] < 0.01]
print(f"Essential genes: {len(essential)} / {len(model.genes)}")
# Double deletions (synthetic lethality) — use multiprocessing
double_results = double_gene_deletion(model, processes=4)
print(f"Double deletion results: {double_results.shape}")
6. Growth Media and Minimal Media
Modify nutrient availability and compute minimal media.
from cobra.io import load_model
from cobra.medium import minimal_medium
model = load_model("textbook")
# View current medium
for rxn_id, flux in sorted(model.medium.items()):
print(f" {rxn_id}: {flux}")
# Anaerobic switch via context manager
with model:
medium = model.medium
medium["EX_o2_e"] = 0.0
model.medium = medium
print(f"Anaerobic growth: {model.slim_optimize():.4f} /h")
# Minimal medium
min_med = minimal_medium(model, minimize_components=True, open_exchanges=True)
print(f"Minimal medium: {len(min_med)} components")
7. Flux Sampling
Sample feasible flux distributions for variability analysis.
from cobra.io import load_model
from cobra.sampling import sample
model = load_model("textbook")
samples = sample(model, n=500, method="optgp")
print(f"Samples shape: {samples.shape}") # (500, n_reactions)
print(f"PFK flux: mean={samples['PFK'].mean():.2f}, std={samples['PFK'].std():.2f}")
8. Production Envelopes and Gapfilling
Compute phenotype phase planes and fill model gaps.
from cobra.io import load_model
from cobra.flux_analysis import production_envelope
model = load_model("textbook")
envelope = production_envelope(
model,
reactions=model.reactions.get_by_id("EX_ac_e"),
carbon_sources=model.reactions.get_by_id("EX_glc__D_e"),
)
print(f"Envelope: {len(envelope)} points")
print(envelope[["flux_minimum", "flux_maximum", "carbon_yield_minimum", "carbon_yield_maximum"]].head())
# Gapfilling: restore growth after reaction removal
from cobra.io import load_model
from cobra.flux_analysis.gapfilling import gapfill
model = load_model("textbook")
universal = load_model("textbook") # In practice, use a universal reaction DB
with model:
model.remove_reactions([model.reactions.get_by_id("PFK")])
print(f"Growth after removing PFK: {model.slim_optimize():.4f}")
for rxn in gapfill(model, universal)[0]:
print(f" Gapfill suggests: {rxn.id}")
Key Concepts
DictList Objects
Reactions, metabolites, and genes are stored in DictList — ordered, indexable, and accessible by ID.
rxn = model.reactions[0] # by index
rxn = model.reactions.get_by_id("PFK") # by ID
matches = model.reactions.query("phospho") # keyword search
Exchange Reactions
EX_ prefix reactions represent system boundary. Positive flux = secretion; negative = uptake. Managed via model.medium dict.
Gene-Reaction Rules (GPR)
Boolean expressions linking genes to reactions: (b0726 and b0727) or b1234. Gene knockout propagates through GPR logic.
Context Managers
with model: snapshots state and reverts all changes on exit (bounds, objective, medium, knockouts). Nesting supported.
Common Workflows
Workflow 1: Gene Knockout Screen
Goal: Identify essential, growth-reducing, and neutral genes.
from cobra.io import load_model
from cobra.flux_analysis import single_gene_deletion
model = load_model("textbook")
wt_growth = model.slim_optimize()
results = single_gene_deletion(model)
results["growth_fraction"] = results["growth"] / wt_growth
essential = results[results["growth_fraction"] < 0.01]
reduced = results[(results["growth_fraction"] >= 0.01) & (results["growth_fraction"] < 0.9)]
neutral = results[results["growth_fraction"] >= 0.9]
print(f"Essential: {len(essential)}, Reduced: {len(reduced)}, Neutral: {len(neutral)}")
for idx in essential.index:
print(f" Essential gene: {list(idx)[0]}")
Workflow 2: Media Optimization
Goal: Find minimal medium at different growth targets; compare aerobic vs anaerobic.
from cobra.io import load_model
from cobra.medium import minimal_medium
import pandas as pd
model = load_model("textbook")
results = []
for frac in [0.1, 0.5, 0.8, 1.0]:
with model:
target = model.slim_optimize() * frac
model.reactions.get_by_id("Biomass_Ecoli_core").lower_bound = target
try:
mm = minimal_medium(model, minimize_components=True, open_exchanges=True)
results.append({"growth_frac": frac, "n_components": len(mm)})
except Exception:
results.append({"growth_frac": frac, "n_components": None})
print(pd.DataFrame(results).to_string(index=False))
for label, o2 in [("Aerobic", 1000.0), ("Anaerobic", 0.0)]:
with model:
medium = model.medium
medium["EX_o2_e"] = o2
model.medium = medium
print(f"{label} growth: {model.slim_optimize():.4f} /h")
Workflow 3: Production Strain Design
Goal: Design a strain with maximized target metabolite production. Combines modules 3, 5, and 8.
- Load model and compute wild-type production envelope for target metabolite
- Screen single gene knockouts for overproducers (filter where target secretion increases)
- Combine top knockouts and validate with FBA under production conditions
- Gapfill if knockouts create infeasibilities
- Compute final production envelope and compare to wild-type
Key Parameters
| Parameter | Module/Function | Default | Range / Options | Effect |
|---|---|---|---|---|
fraction_of_optimum | flux_variability_analysis | 1.0 | 0.0-1.0 | Fraction of max objective to maintain; lower = wider flux ranges |
loopless | flux_variability_analysis | False | True, False | Eliminate thermodynamically infeasible loops; slower |
method | sample | "optgp" | "optgp", "achr" | Sampling algorithm; optgp supports parallelism |
n | sample | required | 100-10000 | Number of flux samples to draw |
processes | sample, double_gene_deletion | 1 | 1-N_cores | Parallel worker processes |
minimize_components | minimal_medium | False | True, False | True = fewest nutrients (MILP); False = minimize total flux |
open_exchanges | minimal_medium | False | True, False | Allow all exchanges as nutrient candidates |
carbon_sources | production_envelope | None | Reaction object | Compute carbon yield alongside flux envelope |
thinning | sample | 100 | 1-1000 | Steps between kept samples; higher = less correlated |
Best Practices
-
Use context managers for temporary changes:
with model:reverts all modifications on exit.with model: model.reactions.PFK.knock_out() print(model.slim_optimize()) # modified # model is restored here -
Validate with
slim_optimize()before analysis: Quick feasibility check before expensive operations (FVA, sampling). -
Check
solution.statusafter optimization: Always verify"optimal"before interpreting fluxes. -
Use loopless FVA when thermodynamic feasibility matters: Standard FVA can include infeasible internal cycles that inflate flux ranges.
-
Parallelize expensive operations: Sampling and double deletions support
processesparameter. -
Prefer SBML for model exchange: Community standard supported by all COBRA tools.
-
Use
slim_optimize()in loops: Skips full flux vector construction, significantly faster for screening. -
Validate flux samples: Use
sampler.validate(samples)to check stoichiometric and bound constraints.
Common Recipes
Recipe: Parameter Scan (Glucose Uptake Rate)
from cobra.io import load_model
model = load_model("textbook")
print("Glucose_uptake | Growth_rate")
for glc_uptake in [1, 2, 5, 10, 15, 20]:
with model:
model.reactions.get_by_id("EX_glc__D_e").lower_bound = -glc_uptake
growth = model.slim_optimize()
print(f" {glc_uptake:>13} | {growth:.4f}")
Recipe: Batch Condition Analysis
from cobra.io import load_model
import pandas as pd
model = load_model("textbook")
conditions = [
{"name": "Rich aerobic", "EX_o2_e": 1000, "EX_glc__D_e": 10},
{"name": "Anaerobic", "EX_o2_e": 0, "EX_glc__D_e": 10},
{"name": "Low glucose", "EX_o2_e": 1000, "EX_glc__D_e": 1},
]
results = []
for c in conditions:
with model:
medium = model.medium
medium["EX_o2_e"], medium["EX_glc__D_e"] = c["EX_o2_e"], c["EX_glc__D_e"]
model.medium = medium
results.append({"condition": c["name"], "growth": round(model.slim_optimize(), 4)})
print(pd.DataFrame(results).to_string(index=False))
Recipe: Model Validation Checklist
from cobra.io import load_model
from cobra.flux_analysis import find_blocked_reactions
model = load_model("textbook")
# Feasibility, mass balance, dead-ends, blocked reactions
print(f"Growth feasible: {model.slim_optimize() > 0}")
print(f"Missing formula: {sum(1 for m in model.metabolites if m.formula is None)}")
print(f"Dead-end metabolites: {sum(1 for m in model.metabolites if len(m.reactions) == 1)}")
print(f"Blocked reactions: {len(find_blocked_reactions(model))} / {len(model.reactions)}")
Troubleshooting
| Problem | Cause | Solution |
|---|---|---|
solution.status == "infeasible" | Constraints cannot be simultaneously satisfied | Check medium has required nutrients; verify reaction bounds; use model.medium to restore defaults |
solution.status == "unbounded" | No upper bound on fluxes | Set finite upper bounds on exchange reactions |
| Very slow optimization | Large model + default GLPK solver | Install CPLEX or Gurobi: model.solver = "cplex" |
ValueError setting bounds | lower_bound > upper_bound temporarily | Set as tuple: rxn.bounds = (new_lb, new_ub) |
| Gene deletion returns NaN | Knockout makes model infeasible | Expected for essential genes; classify as essential |
IOError reading SBML | Invalid SBML or missing namespace | Validate at sbml.org; try cobra.io.sbml.validate_sbml_model(path) |
| Flux samples fail validation | Numerical solver tolerance | Increase thinning parameter; try method="achr" |
Bundled Resources
1 reference file:
references/api_workflows.md— Consolidates API quick reference and advanced workflows. Covers: detailed function signatures, solver configuration (GLPK/CPLEX/Gurobi), advanced analysis (find_blocked_reactions,find_essential_genes/find_essential_reactions), model manipulation (adding reactions/metabolites/genes), flux sample validation, and 5 workflow examples (knockout with visualization, media design, flux space exploration, production strain design, model validation). Relocated inline: basic FBA/FVA/deletion/sampling (Core API modules 3-7). Omitted: geometric FBA internals, MIP gap configuration — consult COBRApy docs.
Related Skills
- escher — interactive metabolic map visualization; use JSON models from COBRApy
- bioservices — retrieve models from BiGG, BioModels, or KEGG for import into COBRApy
- networkx — metabolic network topology analysis; export stoichiometry matrix as graph
References
- COBRApy documentation — official API reference and tutorials
- BiGG Models — curated genome-scale metabolic model repository
- COBRApy GitHub — source code and issue tracker
- Ebrahim et al. (2013) "COBRApy: COnstraints-Based Reconstruction and Analysis for Python" — BMC Systems Biology