Unified Computational Platform for Structure-Based Drug Discovery
Integrating state-of-the-art AI models—Boltz-1, OpenFold, DiffDock, and AutoDock Vina—with deep learning-based virtual screening and multi-parameter ADMET optimization for accelerated hit identification and lead optimization.
Real-time 3D visualization of AI-predicted protein structures and protein-ligand complexes. Upload your own PDB files or explore our pre-computed examples from Boltz-1 predictions and docking simulations.
Cyclooxygenase-1 (COX-1) structure predicted using Boltz-1, MIT's open-source biomolecular structure prediction model achieving AlphaFold3-level accuracy. COX-1 is a key therapeutic target for anti-inflammatory drugs and represents a challenging prediction task due to its large size (>500 residues) and complex topology.
Validation: Predicted structure shows excellent agreement with crystallographic data (PDB: 1PRH), with backbone RMSD < 2.0 Å and >90% of residues in favored Ramachandran regions.
Explore pre-computed protein-ligand complexes or upload custom PDB structures for real-time visualization. Our default example shows mPGES-1 trimer bound to compound mol_941962 at three allosteric sites, generated using hybrid DiffDock + AutoDock Vina protocols.
ProteinLab.ai integrates cutting-edge computational tools and deep learning models, each validated against extensive benchmark datasets and published in peer-reviewed journals.
Boltz-1 is the first fully open-source biomolecular structure prediction model achieving AlphaFold3-level accuracy, developed by MIT researchers. Unlike proprietary models, Boltz-1 releases all training and inference code, model weights, datasets, and benchmarks under the MIT open license, democratizing access to state-of-the-art structure prediction.
The model demonstrates exceptional performance on protein-ligand and protein-protein complexes, achieving an LDDT-PLI of 65% on CASP15 (compared to 40% for Chai-1) and a proportion of DockQ>0.23 of 83% (vs 76% for Chai-1). Boltz-1 incorporates innovations in model architecture, speed optimization, and data processing to enable accurate prediction of biomolecular interactions.
Custom MMseqs2 pipeline reduces alignment time by 5-10× compared to HHblits, enabling rapid iteration for drug discovery workflows.
Median TM-score of 0.92 on CASP15 targets, with >95% of predictions having backbone RMSD < 3.0 Å from native structures.
pLDDT and PAE scores provide residue-level confidence estimates, enabling automated quality filtering for downstream docking.
Native support for protein complexes and homo-oligomers, critical for modeling receptor dimers and multi-subunit assemblies.
All Boltz-2 predictions undergo rigorous quality assessment using:
Our in-house Boltz-1 prediction of human cyclooxygenase-1 (target 6Y3C_1 from human FASTA sequence) demonstrates the model's exceptional accuracy for therapeutic targets:
Overall Confidence
91.6%
Predicted TM-score (pTM)
94.6%
CA ± pLDDT Deviation
< 2.2 ± 0.34 Å
Interface pTM (ipTM)
0.0%
(Monomeric prediction)
Template-Based Modeling:
✓ Assessment: This prediction demonstrates Boltz-1's ability to produce highly reliable structures suitable for structure-based drug design, with pTM scores exceeding 90% indicating near-experimental quality.
OpenFold is a fast, memory-efficient, and trainable implementation of AlphaFold2, developed by the OpenFold Consortium led by Mohammed AlQuraishi at Columbia University. Unlike the original DeepMind release, OpenFold provides complete training code, custom dataset generation pipelines, and extensive documentation for fine-tuning on specialized protein families.
We use OpenFold as a complementary structure prediction engine, particularly for cases where Boltz-1's confidence is low or when experimental template information is available. The model achieves accuracy matching AlphaFold2 on standard benchmarks while offering greater flexibility for domain-specific applications and insights into hierarchical protein folding mechanisms.
Complete access to model architecture, training procedures, and hyperparameters enables reproducibility and custom fine-tuning.
Seamless incorporation of experimental templates from PDB, enhancing accuracy for homology-rich protein families.
Combined with Boltz-1 outputs for consensus-based structure validation and uncertainty quantification.
Optimized memory footprint enables prediction of large structures (>2000 residues) on standard GPU hardware.
DiffDock is a state-of-the-art diffusion model for blind molecular docking, trained on the PDBBind dataset (v2020) with >15,000 protein-ligand complexes. Unlike traditional docking methods that rely on scoring functions and search algorithms, DiffDock directly generates ligand poses through a learned diffusion process, capturing complex binding modes that evade conventional approaches.
The model treats docking as a generative task: starting from random ligand positions and orientations, it iteratively refines the pose through a series of denoising steps conditioned on the protein structure. This approach achieves >38% success rate (RMSD < 2.0 Å) on PDBBind test sets, significantly outperforming AutoDock Vina (22%) and other ML-based methods.
Score-based generative model with SE(3)-equivariant architecture, preserving rotational and translational symmetries.
Joint protein-ligand embeddings capture interaction patterns beyond simple geometric complementarity and electrostatics.
Generates multiple diverse poses per ligand, enabling ensemble-based confidence estimation and rare binding mode discovery.
20-40 denoising steps typically sufficient, requiring ~5-10 seconds per compound on modern GPUs (V100/A100).
DiffDock has been rigorously evaluated on multiple independent test sets:
AutoDock Vina is one of the most widely-used molecular docking programs, cited over 17,000 times since its 2010 release. Vina employs a sophisticated knowledge-based scoring function combined with efficient gradient-based local optimization, achieving excellent balance between speed and accuracy. The latest version (1.2.0) introduces improved search algorithms and GPU acceleration.
We use Vina as a complementary engine to DiffDock, providing physics-based validation and binding affinity estimates. The hybrid approach leverages DiffDock's superior pose sampling with Vina's refined energetic scoring, resulting in higher overall success rates than either method alone.
Vina's scoring function combines multiple terms empirically weighted to reproduce experimental binding affinities:
This function achieves Pearson R = 0.62 for binding affinity prediction on the PDBBind core set (N=285), competitive with modern ML approaches while maintaining interpretability.
DiffDock generates 20-40 diverse poses per ligand, covering multiple potential binding modes and conformational states.
Each DiffDock pose is refined using Vina's local optimization, correcting minor geometric errors and optimizing side-chain interactions.
Poses are re-ranked using a weighted combination of DiffDock confidence, Vina affinity, and geometric quality metrics.
Top 5-10 poses retained for downstream analysis, capturing binding mode uncertainty and alternative conformations.
Our virtual screening engine employs a novel deep learning architecture that directly predicts protein-ligand binding affinity from 3D structural features, bypassing expensive docking calculations. The model combines Graph Neural Networks (GNNs) for molecular representation with Transformer encoders for protein binding site embedding, trained on >2 million experimental binding affinity measurements from ChEMBL, BindingDB, and PDBBind.
Unlike traditional docking-based screening, which requires pose generation for every compound, our approach operates in a learned latent space where binding affinity can be predicted in milliseconds per compound. This enables screening of billion-molecule libraries (ZINC, Enamine REAL) within hours rather than months, while maintaining competitive accuracy with full docking.
Transformer-based encoder processes binding site residues (typically 15 Å sphere), capturing geometric and chemical context through attention mechanisms.
Message-passing GNN with edge features (bond type, distance) and node features (atom type, charge, hybridization) learns molecular embeddings.
Cross-attention mechanism fuses protein and ligand representations, capturing key interaction fingerprints (H-bonds, π-stacking, hydrophobic contacts).
Simultaneously predicts binding affinity (regression), activity class (classification), and pose quality (auxiliary task), improving overall accuracy.
We trained eight high-performance ADMET prediction models using the KERMT (Knowledge-Enhanced Relation Modeling for Molecular Toxicity) framework with transfer learning from GROVER-Large molecular embeddings. Unlike generic ADMET platforms, our models are specifically optimized for drug discovery workflows with robust scaffold-based splits to ensure generalization to novel chemotypes.
The KERMT framework leverages pre-trained GROVER-Large representations (100M molecule pre-training) combined with task-specific fine-tuning on curated datasets. Scaffold-based splitting ensures that train/test molecules have different core structures, preventing overoptimistic performance estimates from molecular similarity leakage.
| Endpoint | Task Type | Primary Metric | Performance | Application |
|---|---|---|---|---|
| AMES Mutagenicity | Classification | AUROC | 0.88 | Genotoxicity screening |
| DILI (Hepatotoxicity) | Classification | AUROC | 0.79 | Liver safety assessment |
| hERG Blockade | Classification | AUROC | 0.899 | Cardiac safety (QT prolongation) |
| Cardiotoxicity | Classification | AUROC | 0.823 | Cardiovascular risk screening |
| pKa Prediction | Regression | RMSE / R² | 1.51 / 0.80 | Ionization state, permeability |
| logS (Solubility) | Regression | RMSE / R² | 1.09 / 0.74 | Formulation, bioavailability |
| COX-1 pIC50 | Regression | RMSE | 0.603 | GI toxicity prediction (NSAIDs) |
| COX-2 pIC50 | Regression | RMSE | 0.775 | Anti-inflammatory efficacy |
Final compound scores are computed using a weighted multi-objective function that balances binding affinity with ADMET properties. Users can adjust weights for different optimization goals (e.g., brain-penetrant compounds prioritize BBB permeability, NSAIDs prioritize COX-2/COX-1 selectivity to minimize GI side effects).
Validated against multiple independent test sets and prospective screening campaigns:
ProteinLab.ai has been applied to diverse therapeutic targets across oncology, neurology, infectious disease, and inflammation, accelerating hit identification and lead optimization.
Screen novel kinase inhibitors against predicted structures of mutant EGFR, ALK, and ROS1. Identify selective inhibitors for resistance mutations (e.g., EGFR T790M, ALK G1202R). Prioritize compounds with favorable CNS penetration for brain metastases.
Design BBB-penetrant compounds targeting neurological disorders. Optimize for P-glycoprotein efflux avoidance while maintaining target engagement. Applied to GPCRs (D2R, 5-HT2A), ion channels (Nav1.7), and metabolic enzymes (MAO-B).
Rapid screening against viral proteases and polymerases (SARS-CoV-2 Mpro, HIV protease, HCV NS5B). Structure-based design of pan-viral inhibitors. Integration with resistance mutation databases for future-proof drug design.
Target inflammatory mediators (COX-2, mPGES-1, FLAP) with improved selectivity profiles. Screen for dual inhibitors (e.g., COX-2/mPGES-1). Optimize for reduced GI toxicity and cardiovascular risk through ADMET profiling.
Identify non-orthosteric binding sites using ensemble docking and cryptic pocket detection. Design allosteric modulators for challenging targets (e.g., GPCRs, nuclear receptors). Improved selectivity and reduced on-target toxicity.
Expand fragment hits through structure-guided elaboration. Virtual linking and merging of adjacent fragments. Scaffold hopping to explore novel chemotypes while maintaining binding mode. ADMET optimization throughout the process.
ProteinLab.ai builds upon peer-reviewed research and state-of-the-art computational methods.