name: deepchem description: "Deep learning for drug discovery. 60+ models (GCN, GAT, AttentiveFP, MPNN, ChemBERTa, GROVER), 50+ featurizers, MoleculeNet benchmarks, HPO, transfer learning. Unified load-featurize-split-train-evaluate API. For fingerprints use rdkit-cheminformatics; for featurization-only use molfeat." license: MIT
DeepChem — Deep Learning for Drug Discovery
Overview
DeepChem is an open-source Python framework providing a unified API for molecular machine learning across drug discovery, materials science, and quantum chemistry. It wraps 60+ model architectures (graph neural networks, transformers, classical ML) with 50+ molecular featurizers and standardized datasets (MoleculeNet), enabling end-to-end workflows from SMILES strings to trained predictive models.
When to Use
- Predicting molecular properties (solubility, toxicity, binding affinity) from SMILES
- Benchmarking models on MoleculeNet standardized datasets (BBBP, Tox21, ESOL, FreeSolv, etc.)
- Training graph neural networks on molecular graphs (GCN, GAT, AttentiveFP, MPNN, DMPNN)
- Fine-tuning pretrained chemical language models (ChemBERTa, GROVER, MolFormer)
- Running hyperparameter optimization for molecular ML models
- Virtual screening and hit prioritization with trained models
- Materials property prediction from crystal structures (CGCNN, MEGNet)
- Protein-ligand interaction modeling and binding affinity prediction
- For fingerprint-based cheminformatics without deep learning, use
rdkit-cheminformaticsinstead - For featurization only (no model training), use
molfeat-molecular-featurizationinstead
Prerequisites
- Python packages:
deepchem(core),torchortensorflow(backend-dependent models) - GPU: Recommended for graph neural networks and transformer models; CPU sufficient for classical ML and fingerprint models
- Data: SMILES strings with property labels (CSV), or MoleculeNet datasets (auto-downloaded)
# Core installation (includes RDKit, scikit-learn, XGBoost)
pip install deepchem
# With PyTorch backend (GNN models)
pip install deepchem[torch]
# With TensorFlow backend (legacy models)
pip install deepchem[tensorflow]
# Full installation (all backends + extras)
pip install deepchem[all]
Quick Start
import deepchem as dc
# Load MoleculeNet dataset with featurization + scaffold split
tasks, datasets, transformers = dc.molnet.load_delaney(featurizer="ECFP")
train, valid, test = datasets
# Train and evaluate a multitask regressor
model = dc.models.MultitaskRegressor(n_tasks=1, n_features=1024, dropouts=0.2)
model.fit(train, nb_epoch=50)
metric = dc.metrics.Metric(dc.metrics.pearson_r2_score)
print(f"Test R2: {model.evaluate(test, [metric])}") # {'pearson_r2_score': ~0.7}
Core API
Module 1: Data Loading and Processing
Load molecular data from CSV files or MoleculeNet benchmark datasets.
import deepchem as dc
# Load from CSV (SMILES + property columns)
loader = dc.data.CSVLoader(
tasks=["measured_log_solubility"],
feature_field="smiles",
featurizer=dc.feat.CircularFingerprint(size=2048, radius=3)
)
dataset = loader.create_dataset("solubility_data.csv")
print(f"Samples: {dataset.X.shape[0]}, Features: {dataset.X.shape[1]}")
# Samples: 1128, Features: 2048
# Load from SDF (3D structures)
sdf_loader = dc.data.SDFLoader(
tasks=["activity"],
featurizer=dc.feat.CoulombMatrix(max_atoms=50)
)
dataset_3d = sdf_loader.create_dataset("molecules.sdf")
# Load MoleculeNet benchmark datasets (auto-download + featurize + split)
# Available: load_delaney, load_bbbp, load_tox21, load_hiv, load_qm7, load_qm9, etc.
tasks, datasets, transformers = dc.molnet.load_tox21(featurizer="ECFP", splitter="scaffold")
train, valid, test = datasets
print(f"Tasks: {len(tasks)}, Train: {len(train)}, Test: {len(test)}")
# Tasks: 12, Train: ~6264, Test: ~631
# Inverse-transform predictions back to original scale
y_pred = model.predict(test)
y_original = transformers[0].untransform(y_pred)
Module 2: Molecular Featurization
Convert molecules to numerical representations for ML. DeepChem provides 50+ featurizers spanning fingerprints, descriptors, graph features, and Coulomb matrices.
import deepchem as dc
smiles = ["CCO", "CC(=O)O", "c1ccccc1", "CC(C)O"]
# Fingerprints (most common for classical ML)
ecfp = dc.feat.CircularFingerprint(size=2048, radius=3)
fp_features = ecfp.featurize(smiles)
print(f"ECFP shape: {fp_features.shape}") # (4, 2048)
# RDKit descriptors (interpretable physicochemical properties)
rdkit_desc = dc.feat.RDKitDescriptors()
desc_features = rdkit_desc.featurize(smiles)
print(f"Descriptor shape: {desc_features.shape}") # (4, 208)
# Graph features (for GNN models — returns ConvMol objects)
graph_feat = dc.feat.ConvMolFeaturizer()
graphs = graph_feat.featurize(smiles)
print(f"Atoms in first mol: {graphs[0].get_num_atoms()}") # 3
# Mol2Vec embeddings (pretrained word2vec on molecular substructures)
mol2vec = dc.feat.Mol2VecFingerprint()
embeddings = mol2vec.featurize(smiles)
print(f"Mol2Vec shape: {embeddings.shape}") # (4, 300)
Module 3: Model Training and Evaluation
DeepChem provides MultitaskRegressor and MultitaskClassifier as general-purpose models, plus specialized architectures for graph and sequence data.
import deepchem as dc
# Load dataset
tasks, datasets, transformers = dc.molnet.load_delaney(featurizer="ECFP")
train, valid, test = datasets
# Regression model (fingerprint input)
model = dc.models.MultitaskRegressor(
n_tasks=1,
n_features=1024,
layer_sizes=[1000, 500],
dropouts=0.25,
learning_rate=0.001,
batch_size=64,
)
model.fit(train, nb_epoch=100)
# Evaluate with multiple metrics
metrics = [
dc.metrics.Metric(dc.metrics.pearson_r2_score),
dc.metrics.Metric(dc.metrics.mean_absolute_error),
dc.metrics.Metric(dc.metrics.rms_score),
]
results = model.evaluate(test, metrics)
print(f"R2: {results['pearson_r2_score']:.3f}, MAE: {results['mean_absolute_error']:.3f}")
# Classification model (e.g., Tox21 toxicity prediction)
tasks, datasets, transformers = dc.molnet.load_tox21(featurizer="ECFP")
train, valid, test = datasets
clf = dc.models.MultitaskClassifier(
n_tasks=len(tasks),
n_features=1024,
layer_sizes=[1000, 500],
dropouts=0.5,
learning_rate=0.001,
)
clf.fit(train, nb_epoch=50)
roc_metric = dc.metrics.Metric(dc.metrics.roc_auc_score, np.mean)
print(f"Mean ROC-AUC: {clf.evaluate(test, [roc_metric])}")
Module 4: Graph Neural Networks
GNNs operate directly on molecular graphs (atoms as nodes, bonds as edges), avoiding information loss from fixed fingerprints.
import deepchem as dc
# Load with graph featurizer
tasks, datasets, transformers = dc.molnet.load_delaney(featurizer="GraphConv")
train, valid, test = datasets
# Graph Convolutional Network (Duvenaud et al.)
gcn_model = dc.models.GraphConvModel(
n_tasks=1,
mode="regression",
graph_conv_layers=[64, 64],
dense_layer_size=256,
dropout=0.2,
learning_rate=0.001,
batch_size=64,
)
gcn_model.fit(train, nb_epoch=100)
metric = dc.metrics.Metric(dc.metrics.pearson_r2_score)
print(f"GCN R2: {gcn_model.evaluate(test, [metric])}")
# AttentiveFP (Xiong et al.) — attention-based GNN, strong on molecular properties
tasks, datasets, transformers = dc.molnet.load_delaney(
featurizer=dc.feat.MolGraphConvFeaturizer(use_edges=True)
)
train, valid, test = datasets
attfp_model = dc.models.AttentiveFPModel(
n_tasks=1,
mode="regression",
num_layers=2,
graph_feat_size=200,
num_timesteps=2,
dropout=0.2,
learning_rate=0.001,
batch_size=64,
)
attfp_model.fit(train, nb_epoch=100)
print(f"AttentiveFP R2: {attfp_model.evaluate(test, [metric])}")
Module 5: Transfer Learning
Fine-tune pretrained chemical language models for downstream tasks with limited data.
import deepchem as dc
from deepchem.models.torch_models import ChemBERTaModel
# ChemBERTa — SMILES-based transformer (pretrained on 77M molecules)
tasks, datasets, transformers = dc.molnet.load_bbbp(featurizer=dc.feat.SmilesTokenizer())
train, valid, test = datasets
chemberta = ChemBERTaModel(
task="classification",
n_tasks=1,
model_dir="chemberta_finetuned/",
)
# Fine-tune on downstream task (BBB permeability)
chemberta.fit(train, nb_epoch=10)
metric = dc.metrics.Metric(dc.metrics.roc_auc_score)
print(f"ChemBERTa ROC-AUC: {chemberta.evaluate(test, [metric])}")
Module 6: Predictions on New Molecules
Run inference on new molecules with a trained model.
import deepchem as dc
import numpy as np
# Assume trained model from Module 3
# Featurize new molecules using same featurizer
featurizer = dc.feat.CircularFingerprint(size=1024, radius=2)
new_smiles = ["c1cc(O)ccc1", "CC(=O)Nc1ccc(O)cc1", "OC(=O)c1ccccc1"]
new_features = featurizer.featurize(new_smiles)
new_dataset = dc.data.NumpyDataset(X=new_features)
predictions = model.predict(new_dataset)
for smi, pred in zip(new_smiles, predictions):
print(f"{smi}: {pred[0]:.2f}")
# Ensemble predictions from multiple models for robustness
models = [model1, model2, model3] # trained models
all_preds = np.array([m.predict(new_dataset) for m in models])
ensemble_mean = all_preds.mean(axis=0)
ensemble_std = all_preds.std(axis=0)
print(f"Ensemble prediction: {ensemble_mean[0][0]:.2f} +/- {ensemble_std[0][0]:.2f}")
Key Concepts
Unified API Pattern
All DeepChem workflows follow a consistent 5-step pattern:
Load Data → Featurize → Split → Train → Evaluate
- Load:
CSVLoader,SDFLoader, ordc.molnet.load_*()(auto-loads MoleculeNet datasets) - Featurize: Pass featurizer to loader, or call
featurizer.featurize(smiles)directly - Split:
ScaffoldSplitter(recommended for drug discovery),RandomSplitter,ButinaSplitter - Train:
model.fit(train_dataset, nb_epoch=N) - Evaluate:
model.evaluate(test_dataset, metrics_list)
Model Selection Guide
| Data Type | Model | Key Feature | Use When |
|---|---|---|---|
| SMILES + fingerprints | MultitaskRegressor | Fast, baseline | First attempt, small datasets |
| SMILES + fingerprints | MultitaskClassifier | Multi-label | Multi-task classification (Tox21) |
| Molecular graphs | GraphConvModel | Learned fingerprints | Medium datasets, general properties |
| Molecular graphs | GATModel | Attention mechanism | When atom importance matters |
| Molecular graphs | AttentiveFPModel | Graph + timestep attention | State-of-art molecular properties |
| Molecular graphs | MPNNModel | Message passing | Complex molecular interactions |
| Molecular graphs | DMPNNModel | Directed MPNN | Bond-level predictions |
| SMILES strings | ChemBERTaModel | Pretrained transformer | Low-data regime, transfer learning |
| SMILES strings | GROVERModel | Graph + transformer | Rich molecular representations |
| Crystal structures | CGCNNModel | Crystal graph CNN | Materials property prediction |
| Crystal structures | MEGNetModel | Graph networks | Materials and molecules |
| Protein sequences | ProteinLigandComplexModel | Complex modeling | Binding affinity prediction |
| Tabular features | XGBoostModel, RandomForestModel | Classical ML | Interpretability, baselines |
Featurizer Selection Guide
| Featurizer | Class | Output | Best For |
|---|---|---|---|
| ECFP/Morgan | CircularFingerprint | Binary vector (1024-2048) | General QSAR, fast baselines |
| MACCS Keys | MACCSKeysFingerprint | 167-bit vector | Substructure filtering |
| RDKit 2D | RDKitDescriptors | 200+ descriptors | Interpretable models |
| Mol2Vec | Mol2VecFingerprint | 300-dim embedding | Similarity, clustering |
| ConvMol | ConvMolFeaturizer | Graph features | GraphConvModel input |
| MolGraph | MolGraphConvFeaturizer | Node + edge features | AttentiveFPModel, MPNNModel |
| Weave | WeaveFeaturizer | Pair features | WeaveModel input |
| Coulomb Matrix | CoulombMatrix | Atom-pair distances | QM property prediction |
| SMILES tokens | SmilesTokenizer | Token IDs | ChemBERTa, transformer models |
Data Splitting Strategies
| Splitter | Use Case | Why |
|---|---|---|
ScaffoldSplitter | Drug discovery (default) | Tests generalization to new chemotypes |
RandomSplitter | Quick experiments | Baseline, but overestimates performance |
ButinaSplitter | Diversity-based | Clusters by Tanimoto similarity |
FingerprintSplitter | Chemical similarity | Groups structurally similar molecules |
MaxMinSplitter | Maximum diversity test | Extreme generalization test |
Common Workflows
Workflow 1: QSAR from CSV Data
Goal: Build a property prediction model from a CSV file with SMILES and activity columns.
import deepchem as dc
import pandas as pd
# Step 1: Load and featurize CSV data
loader = dc.data.CSVLoader(
tasks=["pIC50"],
feature_field="smiles",
featurizer=dc.feat.CircularFingerprint(size=2048, radius=3),
)
dataset = loader.create_dataset("bioactivity_data.csv")
# Step 2: Normalize targets
transformer = dc.trans.NormalizationTransformer(
transform_y=True, dataset=dataset
)
dataset = transformer.transform(dataset)
# Step 3: Scaffold split (realistic for drug discovery)
splitter = dc.splits.ScaffoldSplitter()
train, valid, test = splitter.train_valid_test_split(dataset)
print(f"Train: {len(train)}, Valid: {len(valid)}, Test: {len(test)}")
# Step 4: Train model
model = dc.models.MultitaskRegressor(
n_tasks=1, n_features=2048,
layer_sizes=[1000, 500], dropouts=0.25,
learning_rate=0.001, batch_size=64,
)
model.fit(train, nb_epoch=100)
# Step 5: Evaluate
metrics = [
dc.metrics.Metric(dc.metrics.pearson_r2_score),
dc.metrics.Metric(dc.metrics.mean_absolute_error),
]
results = model.evaluate(test, metrics)
print(f"R2: {results['pearson_r2_score']:.3f}, MAE: {results['mean_absolute_error']:.3f}")
Workflow 2: MoleculeNet Benchmark Comparison
Goal: Compare multiple models on a MoleculeNet benchmark dataset.
import deepchem as dc
# Load dataset with graph featurizer (supports both fingerprint and GNN models)
tasks, datasets, transformers = dc.molnet.load_bbbp(
featurizer="GraphConv", splitter="scaffold"
)
train, valid, test = datasets
metric = dc.metrics.Metric(dc.metrics.roc_auc_score)
# Model 1: Graph Convolutional Network
gcn = dc.models.GraphConvModel(n_tasks=1, mode="classification", dropout=0.2)
gcn.fit(train, nb_epoch=50)
gcn_score = gcn.evaluate(test, [metric])
# Model 2: Random Forest baseline (needs fingerprints)
tasks_fp, datasets_fp, _ = dc.molnet.load_bbbp(featurizer="ECFP", splitter="scaffold")
train_fp, _, test_fp = datasets_fp
rf = dc.models.SklearnModel(
model=dc.models.sklearn_models.RandomForestClassifier(n_estimators=500),
model_dir="rf_model/"
)
rf.fit(train_fp)
rf_score = rf.evaluate(test_fp, [metric])
print(f"GCN ROC-AUC: {gcn_score['roc_auc_score']:.3f}")
print(f"RF ROC-AUC: {rf_score['roc_auc_score']:.3f}")
Workflow 3: Transfer Learning Pipeline
Goal: Fine-tune a pretrained model on a small dataset.
- Load pretrained ChemBERTa model (see Module 5 for code)
- Prepare downstream dataset with
SmilesTokenizerfeaturizer - Fine-tune with reduced learning rate (
1e-5to5e-5) for 5-15 epochs - Evaluate on held-out scaffold split — expect gains over fingerprint baselines when training data < 1000 samples
- Save fine-tuned model:
model.save_checkpoint() - See
references/workflows_model_catalog.mdWorkflow 1 for complete hyperparameter optimization code
Key Parameters
| Parameter | Module | Default | Range / Options | Effect |
|---|---|---|---|---|
n_features | MultitaskRegressor/Classifier | Required | Matches featurizer output | Input feature dimension |
layer_sizes | MultitaskRegressor/Classifier | [1000] | [256] to [1000, 500, 250] | Hidden layer dimensions |
dropouts | All neural models | 0.0 | 0.0-0.5 | Regularization strength |
learning_rate | All neural models | 0.001 | 1e-5-0.01 | Training step size |
batch_size | All neural models | 100 | 16-256 | Samples per gradient update |
nb_epoch | model.fit() | 10 | 10-300 | Training iterations |
size | CircularFingerprint | 2048 | 512-4096 | Fingerprint bit length |
radius | CircularFingerprint | 2 | 2-4 | Substructure neighborhood radius |
graph_conv_layers | GraphConvModel | [64, 64] | [32] to [128, 128, 64] | Graph convolution widths |
num_layers | AttentiveFPModel | 2 | 1-5 | GNN message passing depth |
graph_feat_size | AttentiveFPModel | 200 | 64-512 | Graph feature dimension |
splitter | dc.molnet.load_*() | "scaffold" | "scaffold", "random", "butina" | Data splitting strategy |
Best Practices
-
Always use scaffold splitting for drug discovery: Random splits leak structural information and overestimate performance. Scaffold splits test generalization to novel chemotypes.
-
Normalize regression targets: Apply
NormalizationTransformer(transform_y=True)before training. Remember tountransform()predictions for interpretable values. -
Start with fingerprint baselines: Train
MultitaskRegressor+ ECFP first. Only move to GNNs if fingerprint baseline is insufficient — GNNs need more data and compute.# Baseline first baseline = dc.models.MultitaskRegressor(n_tasks=1, n_features=2048) -
Match featurizer to model: GNN models require graph featurizers (
ConvMolFeaturizer,MolGraphConvFeaturizer). Fingerprint models needCircularFingerprint. Mixing causes silent errors. -
Anti-pattern -- Do not use random split for drug discovery benchmarks: Results with
RandomSplitterare not publishable for molecular property prediction. Reviewers expect scaffold or temporal splits. -
Handle missing labels in multi-task datasets: Tox21 and many bioactivity datasets have missing values. DeepChem handles NaN labels automatically during training (masked loss), but verify with
np.isnan(dataset.y).sum(). -
Use early stopping via validation set: Monitor validation loss to prevent overfitting, especially with GNN models.
Common Recipes
Recipe: Hyperparameter Search
When to use: Optimize model performance before final evaluation.
import deepchem as dc
tasks, datasets, transformers = dc.molnet.load_delaney(featurizer="ECFP")
train, valid, test = datasets
# Define parameter grid
params = {
"n_features": [1024],
"layer_sizes": [[500], [1000, 500], [1000, 500, 250]],
"dropouts": [0.1, 0.25, 0.5],
"learning_rate": [0.001, 0.0005],
}
optimizer = dc.hyper.GridHyperparamOpt(lambda **p: dc.models.MultitaskRegressor(**p))
metric = dc.metrics.Metric(dc.metrics.pearson_r2_score)
best_model, best_params, all_results = optimizer.hyperparam_search(
params, train, valid, metric, logdir="hyperparam_logs/"
)
print(f"Best params: {best_params}")
print(f"Best R2: {best_model.evaluate(test, [metric])}")
Recipe: Save and Reload Models
When to use: Deploy trained models or resume training.
# Save model checkpoint
model.save_checkpoint(model_dir="saved_model/")
# Reload model
loaded_model = dc.models.MultitaskRegressor(n_tasks=1, n_features=2048)
loaded_model.restore(model_dir="saved_model/")
predictions = loaded_model.predict(test)
Recipe: Custom Metric
When to use: Evaluate models with domain-specific metrics.
import deepchem as dc
import numpy as np
def enrichment_factor(y_true, y_pred, top_fraction=0.01):
"""Enrichment factor at top X% of ranked predictions."""
n = len(y_true)
n_top = max(int(n * top_fraction), 1)
top_indices = np.argsort(y_pred.flatten())[-n_top:]
hits_in_top = y_true.flatten()[top_indices].sum()
expected = y_true.sum() * top_fraction
return hits_in_top / expected if expected > 0 else 0.0
ef_metric = dc.metrics.Metric(enrichment_factor, mode="regression")
print(f"EF@1%: {model.evaluate(test, [ef_metric])}")
Troubleshooting
| Problem | Cause | Solution |
|---|---|---|
ModuleNotFoundError: torch | PyTorch not installed | pip install deepchem[torch] for GNN models |
ValueError: n_features mismatch | Featurizer output size does not match model n_features | Check dataset.X.shape[1] and set n_features accordingly |
| NaN loss during training | Learning rate too high or unnormalized targets | Apply NormalizationTransformer, reduce learning rate to 1e-4 |
| Low scaffold-split performance | Model memorizes scaffolds, not properties | Use more data, try GNN models, or add regularization (dropout 0.3-0.5) |
RuntimeError: CUDA out of memory | Batch size too large for GPU | Reduce batch_size (32 or 16), or use CPU for small datasets |
FeaturizationError on some SMILES | Invalid or complex SMILES strings | Pre-filter with RDKit: Chem.MolFromSmiles(smi) is not None |
| Model predicts constant values | Targets not normalized or too few epochs | Apply NormalizationTransformer, increase nb_epoch |
| Slow featurization | Large dataset with expensive featurizer | Use CircularFingerprint (fast) or parallelize with n_jobs parameter |
Bundled Resources
- references/workflows_model_catalog.md -- Extended workflows (hyperparameter optimization with full code, MolGAN generative models, materials property prediction with CGCNN/MEGNet, protein-ligand modeling, custom model architecture) plus complete model catalog (60+ models organized by category) and complete featurizer catalog (50+ featurizers). Covers: workflows 4-8 from original, extended model and featurizer inventories, MoleculeNet dataset catalog. Relocated inline: top 3 workflows (QSAR, MoleculeNet benchmark, transfer learning) are in Common Workflows; core model/featurizer tables are in Key Concepts. Omitted: detailed installation troubleshooting for TensorFlow 1.x (deprecated) and Docker-specific setup (covered by official docs).
Related Skills
- rdkit-cheminformatics -- molecular manipulation, fingerprints, substructure search (upstream featurization)
- molfeat-molecular-featurization -- 100+ featurizers with scikit-learn API (featurization-only alternative)
- datamol-cheminformatics -- Pythonic molecular processing (upstream data prep)
- pytdc-therapeutics-data-commons -- curated ADMET/DTI datasets with standardized splits (complementary data source)
- torch-geometric-graph-neural-networks -- lower-level PyG for custom GNN architectures (alternative for advanced users)
- scikit-learn-machine-learning -- classical ML baselines that DeepChem wraps via
SklearnModel
References
- DeepChem documentation -- official API docs and tutorials
- DeepChem GitHub -- source code, examples, issues
- MoleculeNet benchmark paper -- Wu et al. 2018, benchmark dataset descriptions
- DeepChem tutorials -- Jupyter notebook tutorials