name: "autodock-vina-docking" description: "Molecular docking with AutoDock Vina (Python API). Receptor/ligand prep (Meeko + RDKit), grid box, docking, pose and binding energy analysis, and batch virtual screening." license: "CC-BY-4.0"
AutoDock Vina Molecular Docking
Overview
AutoDock Vina is one of the fastest and most widely used open-source molecular docking engines for predicting protein–ligand binding modes and affinities. This skill covers the full Python-based pipeline: receptor preparation from PDB, ligand preparation from SMILES/SDF via Meeko and RDKit, search box definition, docking execution, pose analysis, and batch virtual screening for hit identification.
When to Use
- Predicting binding poses of small molecules to a protein target
- Estimating relative binding affinities (kcal/mol) for ligand ranking
- Virtual screening of compound libraries against a target receptor
- Validating docking protocols by re-docking co-crystallized ligands
- Preparing docking inputs from SMILES strings without intermediate files
- Comparing binding modes of analogs in a structure-activity relationship study
- Generating starting poses for molecular dynamics simulations
- Use DiffDock instead for blind docking when the binding site is unknown; use GNINA as an alternative with CNN scoring
Prerequisites
- Python packages:
vina,meeko,rdkit,prody(for PDB fetch),py3Dmol(for visualization) - External tools:
ADFR Suite(providesprepare_receptorfor PDBQT conversion) — download from https://ccsb.scripps.edu/adfr/downloads/ - Data requirements: Protein structure (PDB file or PDB ID), ligand(s) as SMILES, SDF, or MOL2
- Environment: Python 3.8+, Linux or macOS recommended
pip install vina meeko rdkit-pypi prody py3Dmol
# ADFR Suite must be installed separately for prepare_receptor
Workflow
Step 1: Fetch and Prepare the Receptor
Download the protein structure, remove water/heteroatoms, and convert to PDBQT format.
import prody
import subprocess
from pathlib import Path
# Download PDB structure (example: HIV-1 protease, PDB 1HPV)
pdb_id = "1HPV"
pdb_file = f"{pdb_id}.pdb"
prody.fetchPDB(pdb_id, compressed=False)
# Extract protein chain only (remove water and ligands)
structure = prody.parsePDB(pdb_file)
protein = structure.select("protein")
prody.writePDB(f"{pdb_id}_protein.pdb", protein)
print(f"Protein atoms: {protein.numAtoms()}")
# Convert to PDBQT using ADFR Suite's prepare_receptor
receptor_pdbqt = f"{pdb_id}_receptor.pdbqt"
subprocess.run([
"prepare_receptor",
"-r", f"{pdb_id}_protein.pdb",
"-o", receptor_pdbqt,
"-A", "hydrogens", # add hydrogens
], check=True)
print(f"Receptor PDBQT: {receptor_pdbqt}")
Step 2: Identify the Binding Site
Define the docking search box centered on the known binding site or co-crystallized ligand.
import numpy as np
# Option A: Center on co-crystallized ligand coordinates
structure = prody.parsePDB(pdb_file)
ligand = structure.select("hetero and not water and not ion")
if ligand is not None:
center = ligand.getCoords().mean(axis=0)
# Box size = ligand extent + padding
extent = ligand.getCoords().max(axis=0) - ligand.getCoords().min(axis=0)
box_size = extent + 10.0 # 10 Å padding on each side
print(f"Binding site center: {center}")
print(f"Box size: {box_size}")
else:
# Option B: Manual coordinates (from literature or visual inspection)
center = np.array([15.0, 54.0, 17.0])
box_size = np.array([25.0, 25.0, 25.0])
print(f"Using manual box: center={center}, size={box_size}")
Step 3: Prepare the Ligand from SMILES
Use RDKit for 3D coordinate generation and Meeko for PDBQT conversion.
from rdkit import Chem
from rdkit.Chem import AllChem
from meeko import MoleculePreparation, PDBQTWriterLegacy
# Define ligand (example: Indinavir, HIV protease inhibitor)
smiles = "CC(C)(C)NC(=O)[C@@H]1CN(CCc2ccccc2)C[C@H]1O"
mol_name = "indinavir_analog"
# Generate 3D coordinates with RDKit
mol = Chem.MolFromSmiles(smiles)
mol = Chem.AddHs(mol)
AllChem.EmbedMolecule(mol, randomSeed=42)
AllChem.MMFFOptimizeMolecule(mol)
# Convert to PDBQT using Meeko
preparator = MoleculePreparation()
mol_setups = preparator.prepare(mol)
# Write PDBQT string (for Vina API) or file
pdbqt_string = PDBQTWriterLegacy.write_string(mol_setups[0])[0]
ligand_pdbqt = f"{mol_name}.pdbqt"
with open(ligand_pdbqt, "w") as f:
f.write(pdbqt_string)
print(f"Ligand PDBQT: {ligand_pdbqt} ({mol.GetNumAtoms()} atoms)")
Step 4: Run Docking
Initialize AutoDock Vina, configure the search space, and execute docking.
from vina import Vina
# Initialize Vina
v = Vina(sf_name="vina", cpu=4)
# Load receptor and ligand
v.set_receptor(receptor_pdbqt)
v.set_ligand_from_file(ligand_pdbqt)
# Define search space
v.compute_vina_maps(
center=center.tolist(),
box_size=box_size.tolist(),
)
# Run docking
v.dock(
exhaustiveness=32, # Higher = more thorough (default 8)
n_poses=10, # Number of output poses
)
# Write output poses
output_file = f"{mol_name}_docked.pdbqt"
v.write_poses(output_file, n_poses=10, overwrite=True)
print(f"Docking complete. Poses written to {output_file}")
Step 5: Analyze Docking Results
Extract binding energies and RMSD values from the docked poses using Vina's built-in API.
# Method A: Vina built-in energies (most reliable)
energies = v.energies(n_poses=10)
# Each row: [total, inter, intra, torsions, intra_best_pose]
print(f"{'Pose':<6} {'Total (kcal/mol)':<18} {'Inter':<10} {'Intra':<10}")
print("-" * 44)
for i, e in enumerate(energies):
print(f"{i+1:<6} {e[0]:<18.2f} {e[1]:<10.2f} {e[2]:<10.2f}")
print(f"\nBest pose: {energies[0][0]:.2f} kcal/mol")
# Method B: Parse PDBQT output with Meeko (for RDKit conversion)
from meeko import PDBQTMolecule, RDKitMolCreate
pdbqt_mol = PDBQTMolecule.from_file(output_file)
best_pose_rdkit = RDKitMolCreate.from_pdbqt_mol(pdbqt_mol)[0]
print(f"Best pose converted to RDKit mol: {best_pose_rdkit.GetNumAtoms()} atoms")
Step 6: Visualize Docking Results
Generate a 3D visualization of the docked complex using py3Dmol.
import py3Dmol
# Load receptor and best docked pose
with open(f"{pdb_id}_protein.pdb") as f:
receptor_pdb = f.read()
with open(output_file) as f:
docked_pdbqt = f.read()
# Create 3D viewer
view = py3Dmol.view(width=800, height=600)
view.addModel(receptor_pdb, "pdb")
view.setStyle({"model": 0}, {"cartoon": {"color": "lightgrey"}})
# Add docked ligand (first model only)
first_model = docked_pdbqt.split("ENDMDL")[0] + "ENDMDL"
view.addModel(first_model, "pdb")
view.setStyle({"model": 1}, {"stick": {"colorscheme": "greenCarbon"}})
# Zoom to ligand
view.zoomTo({"model": 1})
view.show()
# In Jupyter: displays interactive 3D view
# To save: view.png() or view.write_html("docking_result.html")
print("3D visualization rendered")
Step 7: Batch Virtual Screening
Screen a library of compounds against the same receptor.
import pandas as pd
# Define compound library
compounds = pd.DataFrame({
"name": ["cpd_001", "cpd_002", "cpd_003", "cpd_004", "cpd_005"],
"smiles": [
"CC(=O)Oc1ccccc1C(=O)O", # Aspirin
"CC(C)Cc1ccc(cc1)C(C)C(=O)O", # Ibuprofen
"OC(=O)c1ccccc1O", # Salicylic acid
"CC12CCC3C(CCC4CC(=O)CCC34C)C1CCC2O", # Testosterone
"c1ccc2c(c1)cc1ccc3cccc4ccc2c1c34", # Pyrene
],
})
# Screen all compounds
results = []
for _, row in compounds.iterrows():
try:
# Prepare ligand
mol = Chem.MolFromSmiles(row["smiles"])
mol = Chem.AddHs(mol)
AllChem.EmbedMolecule(mol, randomSeed=42)
AllChem.MMFFOptimizeMolecule(mol)
mol_setups = preparator.prepare(mol)
pdbqt_str = PDBQTWriterLegacy.write_string(mol_setups[0])[0]
# Dock
v_screen = Vina(sf_name="vina", cpu=2)
v_screen.set_receptor(receptor_pdbqt)
v_screen.set_ligand_from_string(pdbqt_str)
v_screen.compute_vina_maps(center=center.tolist(), box_size=box_size.tolist())
v_screen.dock(exhaustiveness=16, n_poses=1)
energy = v_screen.energies(n_poses=1)[0][0]
results.append({"name": row["name"], "smiles": row["smiles"], "energy_kcal": energy})
print(f" {row['name']}: {energy:.2f} kcal/mol")
except Exception as e:
results.append({"name": row["name"], "smiles": row["smiles"], "energy_kcal": None})
print(f" {row['name']}: FAILED ({e})")
# Rank by binding energy
results_df = pd.DataFrame(results).sort_values("energy_kcal")
results_df.to_csv("screening_results.csv", index=False)
print(f"\nTop hits:\n{results_df.head()}")
Step 8: Save and Export
import os
os.makedirs("results", exist_ok=True)
# Save summary
results_df.to_csv("results/screening_results.csv", index=False)
# Save best poses for top hits
for _, row in results_df.head(3).iterrows():
print(f"Top hit: {row['name']} → {row['energy_kcal']:.2f} kcal/mol")
print("Virtual screening complete. Results in results/screening_results.csv")
Key Parameters
| Parameter | Default | Range / Options | Effect |
|---|---|---|---|
exhaustiveness | 8 | 8-128 | Search thoroughness; 32+ recommended for publication |
n_poses | 9 | 1-20 | Number of output binding poses |
energy_range | 3.0 | 1.0-5.0 | Max energy difference (kcal/mol) from best pose to include |
sf_name | "vina" | "vina", "ad4", "vinardo" | Scoring function choice |
cpu | all | 1-N | Number of CPU cores for docking |
box_size (xyz) | — | 15-30 Å per side | Search space dimensions; must enclose binding site + 5-10Å padding |
center (xyz) | — | Binding site coordinates | Center of the search box |
randomSeed (RDKit) | random | any int | Reproducible 3D conformer generation |
padding (box) | 10.0 Å | 5.0-15.0 Å | Extra space around known ligand for box definition |
Common Recipes
Recipe: Re-docking Validation (Cognate Docking)
When to use: validating your protocol by re-docking the co-crystallized ligand and checking RMSD < 2.0 Å.
from rdkit.Chem import AllChem, rdMolAlign
# Extract co-crystallized ligand from PDB
ref_ligand = Chem.MolFromPDBFile(f"{pdb_id}_ligand.pdb", removeHs=False)
# Dock the same ligand
# ... (use steps 3-4 above with the extracted ligand)
# Calculate RMSD between docked pose and crystal structure
rmsd = AllChem.GetBestRMS(ref_ligand, best_pose_rdkit)
print(f"Re-docking RMSD: {rmsd:.2f} Å")
print(f"Validation: {'PASS' if rmsd < 2.0 else 'FAIL'} (threshold: 2.0 Å)")
Recipe: Flexible Receptor Docking
When to use: key binding-site residues need conformational freedom (e.g., induced fit).
# Prepare receptor with flexible sidechains (using ADFR Suite)
# prepare_receptor -r protein.pdb -o rigid.pdbqt -A hydrogens
# prepare_flexreceptor -r rigid.pdbqt -s "A:ARG8,A:ASP25,A:ILE50"
v_flex = Vina(sf_name="vina", cpu=4)
v_flex.set_receptor("rigid.pdbqt", "flex.pdbqt") # rigid + flexible parts
v_flex.set_ligand_from_file(ligand_pdbqt)
v_flex.compute_vina_maps(center=center.tolist(), box_size=box_size.tolist())
v_flex.dock(exhaustiveness=64, n_poses=10)
v_flex.write_poses("docked_flex.pdbqt", n_poses=5, overwrite=True)
Recipe: Scoring Only (No Docking)
When to use: evaluating the binding energy of a pre-positioned ligand without running a full search.
v_score = Vina(sf_name="vina")
v_score.set_receptor(receptor_pdbqt)
v_score.set_ligand_from_file("pre_positioned_ligand.pdbqt")
v_score.compute_vina_maps(center=center.tolist(), box_size=box_size.tolist())
# Score current pose
energy = v_score.score()
print(f"Score: {energy[0]:.2f} kcal/mol")
# Local minimization
energy_min = v_score.optimize()
print(f"After local optimization: {energy_min[0]:.2f} kcal/mol")
v_score.write_pose("minimized.pdbqt", overwrite=True)
Recipe: Multiple Scoring Functions Comparison
When to use: consensus scoring to increase confidence in docking results.
scoring_results = {}
for sf in ["vina", "vinardo", "ad4"]:
v_sf = Vina(sf_name=sf, cpu=2)
v_sf.set_receptor(receptor_pdbqt)
v_sf.set_ligand_from_file(ligand_pdbqt)
v_sf.compute_vina_maps(center=center.tolist(), box_size=box_size.tolist())
v_sf.dock(exhaustiveness=16, n_poses=1)
scoring_results[sf] = v_sf.energies(n_poses=1)[0][0]
for sf, energy in scoring_results.items():
print(f" {sf}: {energy:.2f} kcal/mol")
Expected Outputs
{name}_docked.pdbqt— Docked poses in PDBQT format with binding energies in headerresults/screening_results.csv— Virtual screening results: compound name, SMILES, binding energy (kcal/mol){pdb_id}_receptor.pdbqt— Prepared receptor in PDBQT format- Figures: 3D docking visualization (interactive py3Dmol or static image)
- Console output: ranked binding energies and RMSD values per pose
Troubleshooting
| Problem | Cause | Solution |
|---|---|---|
prepare_receptor not found | ADFR Suite not in PATH | Add ADFR Suite bin to $PATH or use full path |
RuntimeError: receptor not set | Forgot to call set_receptor | Call v.set_receptor(pdbqt_file) before docking |
| Very positive docking scores (>0) | Ligand outside box or bad geometry | Check box center/size covers binding site; verify 3D coords |
| All poses identical | exhaustiveness too low | Increase to 32-64 for reliable sampling |
Meeko MoleculePreparation error | Missing hydrogens on input mol | Always call Chem.AddHs(mol) before Meeko |
| RMSD > 2Å in re-docking | Box too small or wrong center | Expand box by 5Å; verify center on co-crystallized ligand |
EmbedMolecule returns -1 | RDKit failed to generate 3D coords | Use AllChem.EmbedMolecule(mol, maxAttempts=1000) or try useRandomCoords=True |
| Slow screening (>1min/compound) | High exhaustiveness + large box | Reduce exhaustiveness to 8-16 for screening; narrow box |
PDBQTWriterLegacy not found | Old Meeko version | pip install meeko>=0.5 — API changed from write_pdbqt_string |
| Inconsistent energies across runs | Non-deterministic search | Set seed parameter in v.dock(seed=42) for reproducibility |
Bundled Resources
This skill includes reference files for deeper lookup. Read these on demand.
references/receptor_preparation_guide.md
Detailed guide for receptor preparation: handling missing residues, protonation states (pH-dependent), metal ions, cofactors, and multi-chain complexes. Decision tree for when to use PDB2PQR, PROPKA, or manual protonation.
references/scoring_functions_comparison.md
Comparison of Vina, Vinardo, and AD4 scoring functions: accuracy benchmarks, speed trade-offs, and recommendations by target class (kinase, protease, GPCR, nuclear receptor).
References
- AutoDock Vina documentation — Official docs and Python API (Apache-2.0)
- Eberhardt et al. (2021) — "AutoDock Vina 1.2.0: New Docking Methods, Expanded Force Field, and Python Bindings", J Chem Inf Model
- Meeko: Preparation of small molecules for AutoDock — Ligand PDBQT preparation from RDKit (LGPL)
- ADFR Suite — Receptor preparation tools from Scripps (free academic license)
- Forli et al. (2016) — "Computational protein-ligand docking and virtual drug screening with the AutoDock suite", Nat Protoc
- RDKit documentation — Cheminformatics toolkit for ligand handling (BSD)