AI in Drug Discovery: AlphaFold 3, Generative Chemistry, and the Tools Rewriting the Pipeline in 2026
Three years. That is how long Dr. Priya Sharma’s team at a mid-size UK biotech spent trying to crystallize GPR84 — a lipid-sensing receptor implicated in fibrosis and inflammatory disease. The protein simply would not cooperate in the lab. Then AlphaFold 3 dropped in May 2024, and she had a high-confidence predicted structure — receptor, co-factors, and a test ligand docked in — within twenty minutes of signing into the server. The bottleneck did not go away. It moved.
That shift is what this article is actually about. Structure prediction is no longer the rate-limiting step in early drug discovery. What happens next — finding a ligand that binds well, designing molecules that actually reach their target and don’t kill the patient, and predicting whether any of this will work in vivo — is where the real competition between AI systems is playing out in 2026. AlphaFold gets all the headlines, but the tools sitting downstream of it are arguably doing more to compress the drug discovery timeline.
The traditional pipeline from target identification to approved drug runs twelve to fifteen years and costs, on average, $2.6 billion per molecule that makes it through. Most candidates fail in Phase II, long after hundreds of millions of dollars have been spent. AI cannot fix clinical trial failure rates caused by complex biology. What it can do is dramatically compress the first two to three years — the part where you are still trying to understand what a protein looks like and whether any small molecule on Earth might bind to it usefully. That part is now almost unrecognizable compared to 2022.
This is a guide to the ten AI systems reshaping that early pipeline — what each one actually does, how to access it, where it fits in the workflow, and, critically, where it breaks down. You will walk away knowing how AlphaFold 3 differs from Boltz-1 and Chai-1, why ESM-3 is doing something genuinely different from structure prediction, and what a full AI-assisted lead discovery campaign looks like end to end in 2026.
Biology is, at its most reductive, a problem of shape. Proteins do what they do because of how they fold, and drugs work by fitting into the pockets those folds create. For most of the twentieth century, determining that shape required growing protein crystals and blasting them with X-rays — a process that takes months when it works and fails entirely for roughly forty percent of medically relevant targets. The information bottleneck was not chemistry or biology; it was structure.
AlphaFold 2 cracked that problem in 2021 in a way that still feels slightly unreal in retrospect. The AlphaFold Protein Structure Database now contains predicted structures for over 200 million proteins — essentially every known protein sequence. AlphaFold 3, released in May 2024, pushed further: it predicts not just isolated protein chains but full molecular complexes including DNA, RNA, small molecule ligands, ions, and post-translational modifications. That distinction matters enormously for drug design, because drugs bind to proteins in the context of other molecules. An isolated protein structure and a structure of the same protein with its natural substrate already docked behave very differently when you try to design a competitor ligand.
The field did not stand still while Google DeepMind built AF3. The Baker Lab at the University of Washington released RoseTTAFold All-Atom in March 2024, with similar all-atom capabilities and a more permissive academic license. MIT’s Boltz-1 followed in November 2024 under Apache 2.0 — meaning commercial use is unrestricted. Chai Discovery released Chai-1 in September 2024 with particularly strong antibody-antigen prediction. These are not distant second-place tools; on most benchmarks they sit within a few percentage points of AlphaFold 3’s accuracy, and for commercial applications where AF3’s server terms restrict use, they are the practical choice.
Structure prediction is now a commodity with multiple competitive options. The real differentiation in 2026 is in what comes after the structure: generative molecular design, ADMET prediction, and synthesis accessibility scoring. AlphaFold 3 is not the end of the pipeline — it is the starting gun.
The generative side of the pipeline has moved just as fast. Insilico Medicine’s Chemistry42 platform designed ISM001-055 — a small molecule for idiopathic pulmonary fibrosis — in 26 days, and that compound entered Phase II clinical trials in 2023. That is the first fully AI-generated drug candidate to reach that milestone. ESM-3, released by EvolutionaryScale in June 2024, adds a genuinely new capability: it is a multimodal protein language model that can generate novel protein sequences conditioned on partial structure and function specifications. In one published demonstration, it designed esmGFP — a novel fluorescent protein estimated to be 500 million years of evolutionary divergence away from any known natural GFP. That is not structure prediction. That is protein creation.
“The question is no longer whether AI can predict a protein structure. The question is whether it can design the molecule that binds to that structure — and whether that molecule can survive contact with a human body.”
— Common framing among computational chemists in 2026
Before looking at individual tools, it helps to see where each one sits in the overall workflow. Drug discovery is not a single task — it is a chain of interdependent problems, and AI systems tend to be specialized for specific links in that chain. Misplacing a tool in the pipeline (using a structure predictor where you need a generative designer, for instance) produces confusion rather than results.
AlphaFold 3, Boltz-1, Chai-1 — predict 3D structure of the target protein and its complexes
DiffDock, Schrödinger Glide — identify druggable pockets, screen fragment libraries via virtual docking
Chemistry42, RFdiffusion, ESM-3, MolMIM — generate novel molecules or protein binders against the identified pocket
ADMET-AI, Chemprop, Chemistry42 filters — predict absorption, distribution, metabolism, excretion, toxicity before synthesis
Schrödinger FEP+, Evo — physics-informed binding affinity ranking, iterative generative cycles
ASKCOS, AiZynthFinder — retrosynthetic route planning, identify synthesizable candidates before ordering CRO work
Access modes vary considerably across the stack. AlphaFold 3 runs in a browser (alphafoldserver.com) with no local setup — though commercial use requires a separate agreement with Google DeepMind. Boltz-1 installs via pip and runs locally on a GPU; so does ESM-3 for smaller model sizes. NVIDIA BioNeMo provides a cloud API for DiffDock, ESMFold, and MolMIM with no local GPU required, paid per inference. Schrödinger and Insilico’s Chemistry42 are commercial platforms with enterprise pricing. RFdiffusion is fully open-source and runs on a single A100.
For teams without GPU infrastructure, the BioNeMo API route is the most practical entry point to the diffusion-based tools. For commercial lead optimization, Schrödinger’s platform remains the gold standard because it pairs ML scoring with physics-based free energy perturbation — a combination that pure ML tools cannot replicate for late-stage candidate ranking.
Released by Google DeepMind in May 2024, AlphaFold 3 is the most significant update to the AlphaFold family since the original breakthrough. The core advance over AlphaFold 2 is straightforward to state and hard to overstate in practice: it predicts molecular complexes, not just isolated protein chains. You can submit a protein sequence alongside a small molecule SMILES string, a nucleotide sequence, or both, and receive a predicted 3D structure showing how they interact.
This matters for drug design because the binding-competent conformation of a protein — the shape it takes when a ligand is already sitting in the pocket — often differs substantially from its apo (ligand-free) form. AF3’s diffusion-based architecture, which generates atomic coordinates by iteratively denoising from random positions, captures this conformational nuance better than AF2’s template-based approach. Confidence is reported per-residue as pLDDT (0–100, where >70 is usable) and as Predicted Aligned Error for inter-chain contacts.
# ACCESS: alphafoldserver.com — free browser-based, no local GPU # NOTE: Commercial use requires separate agreement with Google DeepMind --- Input (JSON or web form) --- { "name": "[YOUR_TARGET_NAME]", "sequences": [ { "proteinChain": { "sequence": "[PROTEIN_AMINO_ACID_SEQUENCE]", "count": 1 } }, { "ligand": { "smiles": "[LIGAND_SMILES]", // e.g. aspirin: CC(=O)Oc1ccccc1C(=O)O "count": 1 } } ] } --- Output files --- [NAME]_model_0.cif // top-ranked predicted structure [NAME]_full_data_0.json // pLDDT, PAE matrices per residue pair --- Quality filters before proceeding --- Accept structure if: pLDDT > 70 in binding region, PAE < 10 Å between protein and ligand Flag for caution: loop regions with pLDDT < 50 near the binding site
Why It Works: AF3’s diffusion architecture generates the full atomic assembly rather than predicting protein and ligand positions separately. This allows it to capture protein conformational changes induced by ligand binding — a critical detail that structure-then-dock workflows systematically miss.
How to Adapt It: For membrane proteins (GPCRs, ion channels) where crystallography consistently fails, AF3 is often the only practical route to a predicted bound complex. Submit with known endogenous ligand first to anchor the pocket geometry, then run your candidate ligand in a second submission for comparison.
The commercial restriction on AlphaFold 3’s server (no commercial use without a separate DeepMind agreement) created an immediate market for open alternatives. Two arrived within months of each other. MIT’s Boltz-1, released under Apache 2.0 in November 2024, matches AlphaFold 3 on the majority of CASP15 benchmarks and runs locally via a straightforward Python package. Chai Discovery’s Chai-1, released in September 2024, is particularly strong on antibody-antigen and multi-chain assemblies — a known weak point for earlier open-source tools.
Neither is strictly a copy of AF3’s architecture. Boltz-1 uses a similar diffusion-based approach but introduces changes in how multiple sequence alignments are incorporated. Chai-1 takes a different stance on handling conformational flexibility. For most drug discovery applications — particularly early hit identification and binding pocket characterization — they produce structures that are effectively indistinguishable from AF3 in practical utility.
# INSTALL (requires Python 3.10+, CUDA GPU recommended) pip install boltz # Create input YAML: input.yaml version: 1 sequences: - protein: id: A sequence: "[PROTEIN_AMINO_ACID_SEQUENCE]" - ligand: id: B smiles: "[LIGAND_SMILES_STRING]" # RUN with free MSA server (no local MSA database needed) boltz predict input.yaml --use_msa_server --output_format mmcif # OUTPUT: boltz_results_input/predictions/input/ # input_model_0.cif — top-ranked structure # input_confidence_0.json — per-residue pLDDT scores # For Chai-1 instead (install: pip install chai-lab) # from chai_lab.chai1 import run_inference — see docs.chaidiscovery.com
Why It Works: Apache 2.0 licensing removes the contractual friction that makes AF3 impractical for commercial drug programs. Running locally also means protein sequences never leave your infrastructure — an important consideration for proprietary targets ahead of patent filing.
How to Adapt It: For antibody design programs, Chai-1 has shown consistently stronger performance on antibody-antigen complex geometry. Run both Boltz-1 and Chai-1 on the same target and compare PAE scores across the interface region; disagreement between them often flags genuinely uncertain binding modes worth further investigation.
ESM-3 from EvolutionaryScale is doing something categorically different from the structure prediction tools above, and the distinction is easy to miss in popular coverage. Structure prediction answers the question: given this sequence, what shape does it fold into? ESM-3 answers a different question: given a partial specification of what I want — some sequence residues, some structural features, some functional annotations — can you generate a protein that satisfies all those constraints simultaneously?
Trained on 2.78 billion protein sequences and 236 million structures, the 98-billion-parameter model can take a partial sequence with unknown positions marked, a secondary structure specification, and a functional keyword, and generate coherent protein designs that satisfy all three simultaneously. The published demonstration of esmGFP — a fluorescent protein with less than 58% sequence identity to any natural GFP variant — offers a concrete picture of what “generative” means here. The model was not retrieving a known protein; it was creating one in a region of sequence space that billions of years of evolution never explored.
# pip install esm (open weights: esm3-open-2024-03) # Larger models (98B) via EvolutionaryScale Forge API from esm.models.esm3 import ESM3 from esm.sdk.api import ESMProtein, GenerationConfig client = ESM3.from_pretrained("esm3-open-2024-03") # Design a protein binder: specify known terminus residues, # leave internal region as "_" for the model to fill in. protein = ESMProtein( sequence="[N_TERM_ANCHOR_RESIDUES]" + "_" * [NUM_RESIDUES_TO_GENERATE] + "[C_TERM_ANCHOR_RESIDUES]", # Optionally constrain secondary structure: # secondary_structure="HHHHHH...EEEEEE..." (H=helix, E=sheet, C=coil) ) generated = client.generate( protein, GenerationConfig(track="sequence", num_steps=8, temperature=0.7) ) # Retrieve generated sequence and predicted structure: print(generated.sequence) # full generated amino acid sequence structure = client.generate( # fold the generated sequence ESMProtein(sequence=generated.sequence), GenerationConfig(track="structure", num_steps=8) )
Why It Works: ESM-3 was trained with a joint objective across sequence, structure, and function tracks simultaneously — not just on sequence-to-structure mappings. This means conditioning on structure constraints genuinely guides sequence generation rather than being a post-hoc filter.
How to Adapt It: For therapeutic antibody programs, ESM-3 can generate CDR-H3 loop sequences (the most variable and target-specific part of an antibody) conditioned on a target epitope structure from AF3. Fix the framework regions, specify the epitope contact geometry as structural constraints, and generate 50-100 CDR-H3 candidates for experimental screening.
Virtual docking — predicting how a small molecule positions itself inside a protein pocket — is the computational step that bridges structure prediction and generative design. Traditional tools like AutoDock Vina treat docking as an optimization problem, sampling the molecule’s position and orientation through a scoring function. They work reasonably well when the binding site is known in advance. DiffDock, from MIT, takes a different approach: it frames docking as a generative problem and uses a diffusion model operating in SE(3) space (the mathematical space of rotations and translations in 3D) to simultaneously predict binding site and binding pose.
On blind docking benchmarks — where the algorithm must identify both the binding site and the pose without any prior information — DiffDock outperforms AutoDock Vina on the PoseBusters benchmark by a substantial margin, particularly for ligands with flexible side chains. The 2024 DiffDock-L variant improves further on larger protein targets. This matters most in early hit discovery, where you have a protein structure and a fragment library and need to rank thousands of molecules by predicted binding quality without any crystallographic reference.
# OPTION A — Local (GPU required) git clone https://github.com/gcorso/DiffDock.git && cd DiffDock pip install -r requirements.txt python inference.py \ --protein_path [RECEPTOR_PDB_OR_CIF_FILE] \ --ligand [LIGAND_SMILES_OR_SDF_FILE] \ --out_dir results/[COMPOUND_NAME] \ --inference_steps 20 \ --samples_per_complex 40 \ # generate 40 pose candidates --batch_size 10 \ --no_final_step_noise # Top ranked pose: results/[COMPOUND_NAME]/rank1.sdf # Confidence score: results/[COMPOUND_NAME]/rank1_confidence.txt # OPTION B — NVIDIA BioNeMo API (no local GPU) import requests response = requests.post( "https://health.api.nvidia.com/v1/biology/diffdock", headers={"Authorization": "Bearer [NVIDIA_API_KEY]"}, json={ "protein": "[PDB_STRING_OR_URL]", "ligand": "[SMILES_STRING]", "num_poses": 20, "time_divisions": 20 } )
Why It Works: SE(3) diffusion allows the model to explore the full space of protein-ligand orientations simultaneously rather than hill-climbing from an initial guess. This substantially reduces the risk of missing a true binding mode because the starting pose was wrong.
How to Adapt It: For fragment-based screening campaigns, run DiffDock across a library of 1,000-5,000 fragments against your AF3-predicted structure. Filter to confidence score >0.6 and RMSD clustering to identify distinct binding modes. This generates a shortlist for physical fragment screening in a fraction of the time of experimental high-throughput docking.
Where DiffDock predicts how existing molecules bind to proteins, RFdiffusion designs entirely new proteins intended to bind a target. Developed by the Baker Lab at the Institute for Protein Design, it applies diffusion over protein backbone coordinates — generating novel protein structures from noise, guided by constraints derived from a target binding site. The 2024 extension, RFdiffusion All-Atom, adds the ability to design proteins around small molecules, enabling the design of custom binding proteins for any chemical scaffold of interest.
The practical application that has generated the most pharma interest is binder design: given a target protein surface (a cytokine, a viral receptor, a membrane protein loop), RFdiffusion generates backbone coordinates for a protein that would bind that surface. ProteinMPNN is then used to design amino acid sequences for the generated backbone. Experimental validation rates for computationally designed binders using this pipeline have reached 20-40% in recent publications — meaning one in three to five computational designs actually binds when synthesized and tested. That compares to essentially zero for random sequence searches.
# Clone and install: github.com/RosettaCommons/RFdiffusion # STEP 1: Generate backbone for binder against target hotspot residues python run_inference.py \ diffuser.T=200 \ inference.input_pdb=[TARGET_PDB_FILE] \ contigmap.contigs=["[TARGET_CHAIN][HOTSPOT_START]-[HOTSPOT_END]/0 [BINDER_LENGTH_MIN]-[BINDER_LENGTH_MAX]"] \ ppi.hotspot_res=["[CHAIN][RESIDUE_NUMBERS_CSV]"] \ # e.g. ["A45,A48,A52"] inference.num_designs=[NUM_BACKBONES] \ # recommend 100–500 inference.output_prefix=designs/binder_ # STEP 2: Design sequences for generated backbones via ProteinMPNN python protein_mpnn_run.py \ --pdb_path designs/binder_0.pdb \ --out_folder sequences/ \ --num_seq_per_target 8 \ --sampling_temp 0.1 # lower = more conserved core packing # STEP 3: Validate designs # Run AlphaFold3 or Boltz-1 on each (protein + target complex) # Accept: pAE_interaction < 10, ipTM > 0.6
Why It Works: Diffusion over backbone coordinates allows RFdiffusion to explore protein fold space unconstrained by evolutionary precedent. The resulting binders can have entirely novel topologies that no natural protein has ever adopted — which is often exactly what you want when trying to address a surface that evolution has not targeted.
How to Adapt It: For PPI (protein-protein interaction) inhibitors — among the hardest targets in drug discovery — RFdiffusion can design peptide mimetics (50-70 residue binders) that compete with the natural protein partner. These are not small molecules, but they are often the only viable path to disrupting a flat, featureless binding interface.
If ESM-3 and RFdiffusion represent the protein design side of generative AI, Chemistry42 represents its small molecule equivalent — and it has the most concrete clinical validation of any generative chemistry platform to date. Insilico Medicine’s platform uses an ensemble of over 80 generative algorithms to design novel small molecules against a target, with ADMET property optimization built directly into the generation loop rather than applied as a post-hoc filter. The difference matters: generating molecules that are potent but toxic is easy; generating ones that are potent, selective, and orally bioavailable simultaneously is the actual problem.
The headline figure is ISM001-055, a small molecule for idiopathic pulmonary fibrosis designed in 26 days using Chemistry42. It entered Phase II clinical trials in 2023 — the first fully AI-generated drug to reach that milestone. More recent programs using Chemistry42 v3 (released 2025) operate at higher throughput: the platform can run iterative generate-dock-filter-regenerate cycles automatically, returning a ranked lead series within hours rather than weeks for well-characterized targets.
# ACCESS: platform.insilico.com (web UI or REST API) # API request body (JSON): { "project": "[YOUR_PROJECT_NAME]", "target": { "protein_structure": "[PDB_ID_OR_UPLOADED_CIF]", "binding_site_residues": [[RESIDUE_NUMBER_LIST]], // e.g. [45, 48, 72, 89, 104] "hit_smiles": ["[REFERENCE_LIGAND_SMILES]"] // optional seed for optimization mode }, "design_parameters": { "mode": "generative", // or "optimization" to improve a hit "admet_filters": ["LogP", "hERG", "Ames", "CYP3A4"], "property_targets": { "LogP": {"min": 1, "max": 4}, "MW": {"max": 500}, "hERG_IC50_uM": {"min": 10} // cardiac safety: hERG > 10 µM }, "novelty_weight": 0.7, // 0–1, higher = more structural novelty "num_candidates": 500, "cycles": 3 // iterative generate → dock → filter → regenerate } } # OUTPUT: ranked SMILES with predicted IC50, ADMET scores, synthetic accessibility (SA Score) # Recommend: filter SA Score < 3.5 before ordering synthesis
Why It Works: The iterative cycle approach means each generation round is conditioned on the docking results of the previous round. Molecules that score well on binding but fail ADMET filters get replaced in the next cycle with analogues that preserve the pharmacophore but improve the problematic property. This is the closest current AI analogue to what a medicinal chemist does manually over months of SAR work.
How to Adapt It: For fragment-to-lead expansion, feed the top DiffDock hits from Tool 4 as reference seeds in “optimization” mode. Chemistry42 will grow the fragment into a drug-like molecule while maintaining the binding mode identified by docking — a significant time saving over manual fragment elaboration.
Evo occupies a different corner of the AI drug discovery space from the protein-focused tools above. Released by the Arc Institute in February 2024, it is a 7-billion-parameter autoregressive model trained on 2.7 million microbial and viral genomes at single-nucleotide resolution. Its native vocabulary is DNA — not amino acids, not SMILES strings — which means it understands the relationship between genomic sequence, gene regulatory structure, and protein output in a way that protein-only models cannot.
The most immediately relevant application for drug discovery is zero-shot fitness prediction: given a gene sequence and a proposed point mutation, Evo predicts whether that mutation increases or decreases biological fitness without any fine-tuning on assay data. In published benchmarks, it outperforms ESM-2 (a protein language model) on predicting the fitness effects of mutations in essential bacterial genes — even though it is reasoning from DNA rather than protein sequence. It has also been used to generate novel CRISPR systems with predicted activity, and to design regulatory elements controlling gene expression.
# pip install evo-model (Arc Institute, Apache 2.0) from evo import Evo import torch evo_model = Evo('evo-1-131k-base') # 131k context window variant model, tokenizer = evo_model.model, evo_model.tokenizer # TASK 1 — Zero-shot fitness prediction for a point mutation # Compare log-likelihood of wildtype vs mutant sequence: wildtype = "[DNA_SEQUENCE_WILDTYPE]" mutant = "[DNA_SEQUENCE_WITH_MUTATION_AT_POS_N]" def score_sequence(seq): ids = torch.tensor(tokenizer.tokenize(seq)).unsqueeze(0).to(model.device) with torch.no_grad(): logits, _ = model(ids) # Sum log-probabilities of observed tokens: return torch.nn.functional.cross_entropy( logits[0, :-1], ids[0, 1:], reduction='sum' ).neg().item() delta_fitness = score_sequence(mutant) - score_sequence(wildtype) # delta_fitness > 0 → mutation predicted beneficial # TASK 2 — Conditional generation (e.g., novel CRISPR spacer) prompt = "[PAM_SEQUENCE_5_TO_3][PARTIAL_TARGET_SEQUENCE_20nt]" generated_ids = evo_model.generate(prompt, n_tokens=[COMPLETION_LENGTH], temperature=0.7) generated_seq = tokenizer.detokenize(generated_ids)
Why It Works: Training at DNA resolution means Evo learns codon usage, promoter logic, and cross-species sequence conservation simultaneously — a richer representation of biological function than protein sequence alone provides. Its zero-shot fitness predictions work without any target-specific training data, making it useful for novel targets where experimental assay data is sparse.
How to Adapt It: For target identification in infectious disease programs, Evo can score entire bacterial or viral genomes for essential gene candidates — positions where mutations consistently reduce fitness across diverse strains. This narrows the target space before any structure prediction work begins, saving significant downstream compute.
BioNeMo is not a single model — it is NVIDIA’s platform for deploying a curated collection of the best open-source drug discovery AI models as production-ready API microservices. As of 2026, the platform hosts over 30 pretrained models including ESMFold, ProtT5, DiffDock, RFdiffusion, MolMIM (a molecular generation model), and the AlphaFold2 Multimer variant, all accessible via standardized REST APIs called NIMs (NVIDIA Inference Microservices). The key proposition: you can call DiffDock, ESMFold, and MolMIM in sequence without managing any local GPU infrastructure.
The pharmaceutical industry adoption has been substantial. AstraZeneca, Amgen, and GSK all have announced BioNeMo integrations. NVIDIA’s $688 million investment in Recursion Pharmaceuticals — which runs a 22-petabyte phenomics dataset through BioNeMo workflows — is the highest-profile signal of where the platform is heading: not just inference API access, but integrated pipelines connecting biological imaging data, sequence models, and molecular generation in a single platform.
import requests NVIDIA_KEY = "[YOUR_NVIDIA_API_KEY]" HEADERS = {"Authorization": f"Bearer {NVIDIA_KEY}", "Content-Type": "application/json"} # STEP 1 — Fold target protein via ESMFold NIM fold_resp = requests.post( "https://health.api.nvidia.com/v1/biology/esmfold", headers=HEADERS, json={"sequence": "[TARGET_PROTEIN_SEQUENCE]"} ) pdb_string = fold_resp.json()["pdbs"][0] # STEP 2 — Generate candidate molecules via MolMIM NIM molmim_resp = requests.post( "https://health.api.nvidia.com/v1/chemistry/nvidia/molmim/generate", headers=HEADERS, json={ "smi": "[SEED_SMILES]", // known binder or fragment hit "num_molecules": 20, "iterations": 50, "scaffold": "[CORE_SCAFFOLD_SMILES]", // optional — fix core ring system "property_name": "QED", // optimize quantitative drug-likeness "min_similarity": 0.3 // Tanimoto minimum vs seed } ) candidates = [m["smiles"] for m in molmim_resp.json()["molecules"]] # STEP 3 — Score candidates via DiffDock NIM (see Tool 4 for full params) # Chain: fold → generate → dock → filter → return ranked SMILES
Why It Works: The NIM architecture standardizes API interfaces across models from different research groups, so you can swap ESMFold for AlphaFold2 Multimer (or future models as they are added) without changing your pipeline code. This is what separates it from running each model independently from GitHub repositories.
How to Adapt It: For high-throughput virtual screening of large compound libraries (10,000+ molecules), BioNeMo’s batch API endpoints allow parallelized docking runs at a cost that is negligible compared to physical screening. Run MolMIM to generate 500 analogues of a hit, then batch-dock all 500 against the target in a single API session.
Everything covered so far operates primarily in the domain of machine learning. Schrödinger’s platform makes a different philosophical bet: the most accurate predictions of binding affinity require physics, not just statistics. Free Energy Perturbation (FEP+) calculates the thermodynamic free energy difference between two ligands binding to the same target by running molecular dynamics simulations with alchemical perturbations — mathematically transforming one ligand into another inside the binding pocket while tracking the energy cost of doing so.
The result is binding affinity predictions with root-mean-square errors of 1.0-1.2 kcal/mol against experimental data across validated datasets — far better than pure ML scoring functions, which typically achieve 1.5-2.5 kcal/mol RMSE. Schrödinger’s 2025 integration with OPLS4, their latest force field, and their ML-accelerated FEP variant (which reduces simulation time by ~4x while preserving accuracy) makes it practical to rank 50-100 lead compounds within a week rather than months. Over 100 pharmaceutical companies use FEP+ in their lead optimization pipelines; it is validated on more than 7,000 published ligand perturbations.
# Schrödinger Python API (requires Schrödinger Suite + license) from schrodinger.application.scisol.packages.fep import graph from schrodinger.structure import StructureReader # Define perturbation graph: reference compound + analogues to rank fep_mapper = graph.Graph.from_smiles( reference_smiles="[REFERENCE_COMPOUND_SMILES]", smiles_list=[ "[ANALOGUE_1_SMILES]", "[ANALOGUE_2_SMILES]", "[ANALOGUE_3_SMILES]", # ... up to ~50 analogues per FEP campaign ], protein_file="[RECEPTOR_PREPARED.mae]", forcefield="OPLS4", // latest force field — required for accuracy ) results = fep_mapper.run( n_replicates=3, // triplicate for statistical confidence simulation_time=5.0, // nanoseconds per lambda window n_lambda=12, // alchemical perturbation windows use_rest=True // REST2 enhanced sampling for flexible ligands ) # results.ddG_df: DataFrame with predicted ΔΔG vs reference for each analogue # Negative ΔΔG = predicted tighter binding than reference # Prioritize compounds with ΔΔG < -1.0 kcal/mol AND passed ADMET filters
Why It Works: ML scoring functions learn correlations from training data; FEP calculates physical reality from first principles. This means FEP+ performs reliably on chemical series that are underrepresented in training data — exactly the situation you are in when working with a novel scaffold from generative chemistry.
How to Adapt It: Use FEP+ as the final gate before committing to CRO synthesis, not as a primary screening tool. Feed the top 50 candidates from Chemistry42 or MolMIM through FEP+ to separate real binders from AI-generated artifacts. The computational cost (~$500-2,000 per FEP campaign on Schrödinger’s cloud) is negligible compared to the cost of synthesizing and testing 50 compounds physically.
The ten tools in this article are not independent options — they are sequential stages in a pipeline that, when run end to end, compresses what was once a two-to-four-year process into eight to sixteen weeks for well-characterized target classes. No single tool covers the whole chain. AlphaFold 3 gives you a structure; it tells you nothing about what molecule would bind usefully. Chemistry42 generates molecules; it requires a structure and a defined binding site to work against. Schrödinger FEP+ ranks candidates with high accuracy; it requires actual candidate molecules to rank.
The master workflow below integrates all five stages: structure prediction, binding site identification, generative design, physics-based ranking, and synthesis planning. It is not a theoretical architecture — variations of this pipeline are in active use at Insilico Medicine, Recursion, Exscientia, and the in-house computational chemistry groups of AstraZeneca, Pfizer, and Roche as of 2026. The specific tool choices at each stage can be swapped; the stage sequence cannot.
############################################################## # STAGE 1 — Target Structure (2–4 hours) ############################################################## AF3_or_Boltz1.predict( protein_sequence=[TARGET_SEQUENCE], ligand_smiles=[KNOWN_COFACTOR_OR_ENDOGENOUS_LIGAND] # anchor the active conformation ) → Filter: pLDDT > 70 in binding region, PAE_interface < 10 Å → If uncertain: run Boltz-1 AND Chai-1 independently, compare pocket geometry ############################################################## # STAGE 2 — Blind Docking / Binding Site Mapping (4–8 hours) ############################################################## DiffDock.blind_dock( receptor=[STAGE1_OUTPUT.cif], ligand_library=[FRAGMENT_LIBRARY_OR_KNOWN_HITS], # 500–5,000 compounds n_poses=40, n_samples=20 ) → Filter: confidence > 0.6, cluster by binding site → Identify 1–3 distinct druggable pockets from cluster centroids ############################################################## # STAGE 3 — Generative Design (1–5 days, iterative) ############################################################## Chemistry42.generate( target_structure=[STAGE1_OUTPUT], binding_site=[STAGE2_POCKET_RESIDUES], seed_smiles=[STAGE2_TOP_FRAGMENT], # optional admet_filters=[LogP_1-4, MW_lt500, hERG_gt10uM, Ames_neg], cycles=3, num_candidates=500 ) → Filter: SA Score < 3.5 (synthesizability) → Redock top 200 with DiffDock to verify binding mode consistency ############################################################## # STAGE 4 — Physics Validation (3–7 days) ############################################################## Schrodinger_FEP_plus.rank( reference=[BEST_DOCKING_HIT], analogues=[TOP_50_FROM_STAGE3], protein=[STAGE1_STRUCTURE_PREPARED.mae], forcefield="OPLS4" ) → Prioritize: ΔΔG < -1.0 kcal/mol AND all ADMET filters passed → Typical output: 5–15 confirmed leads ############################################################## # STAGE 5 — Synthesis Planning (1–2 days) ############################################################## ASKCOS.retrosynthesis(lead_smiles_list=[STAGE4_TOP_LEADS]) → Flag: routes > 4 steps or requiring controlled reagents → Order synthesis from CRO for confirmed leads with SA Score < 3 AND ≤4-step route # TOTAL WALL TIME: 8–16 weeks to confirmed lead series # vs. traditional pipeline: 2–4 years to equivalent milestone
Why It Works: Each stage feeds the next with progressively higher-confidence information. Starting with a physics-grounded structure (not a random AF2 output), the pipeline accumulates epistemic confidence at each step rather than compounding uncertainty. The ADMET-first philosophy — filtering for drug-likeness at generation time rather than at the end — eliminates the most common failure mode of pure ML generative approaches, which produce potent but undruggable molecules.
How to Adapt It: For antibody drug programs, replace the Chemistry42 step with RFdiffusion + ESM-3 (Tools 5 and 3) for the generative stage, and replace Schrödinger FEP+ with Rosetta’s InterfaceAnalyzer for binding energy estimation. The pipeline structure is the same; the tools are swapped for the relevant molecular modality.
| Model | Released | Organization | Covers | Commercial Use | Access | CASP15 / Benchmark |
|---|---|---|---|---|---|---|
| AlphaFold 3 | May 2024 | Google DeepMind | Protein, DNA, RNA, ligand, ions, mods | Restricted | Web server (free) | Best-in-class protein-ligand docking; sets new bar for complex prediction |
| Boltz-1 | Nov 2024 | MIT | Protein, DNA, RNA, ligand | Apache 2.0 | pip install, local GPU | Matches AF3 within ~2% on CASP15 protein-ligand tasks |
| Chai-1 | Sep 2024 | Chai Discovery | Protein, ligand, antibody-antigen | Custom license | pip install / API | Top performer on antibody-antigen complex benchmarks |
| RoseTTAFold AA | Mar 2024 | Baker Lab / IPD | Protein + all atoms + small molecules | Academic free | Local (GitHub) | Strong on protein-small molecule covalent interactions |
| ESM-3 | Jun 2024 | EvolutionaryScale | Protein sequence + structure + function | Open weights (non-commercial) | pip / Forge API | First multimodal generative protein model; novel protein design validated |
| ESMFold | 2022 | Meta AI | Protein only (single chain) | MIT license | pip / BioNeMo API | Fast but 5–8% below AF3 accuracy; useful for speed-sensitive screening |
A predicted structure with pLDDT >90 is a hypothesis, not a measurement. AlphaFold 3 is remarkably accurate for structured domains — but loop regions, disordered termini, and induced-fit binding pockets can deviate significantly from reality. Using an AF3 structure directly for FEP calculations without at least molecular dynamics equilibration routinely produces misleading binding affinity predictions. Treat the predicted structure as a starting model for further refinement, not an endpoint.
Running AlphaFold on your target is the equivalent of getting a map of the building. You still need to find the door, pick the lock, and confirm that entering actually does what you intended. A beautiful predicted structure with no downstream docking, generative design, or ADMET work is a publication, not a drug program. The structure is the starting point; five more computational stages and years of wet-lab work follow.
Generative models optimize for what they are trained to optimize. Chemistry42 optimizes binding affinity, drug-likeness, and a handful of ADMET properties simultaneously — but it can still produce molecules with low synthetic accessibility scores, requiring 8+ step syntheses with controlled reagents. Always run ASKCOS or calculate SA Score before committing to synthesis. A molecule that is computationally brilliant but takes six months to synthesize in 3% yield is not useful.
Boltz-1 and Chai-1 are within a few percent of AlphaFold 3 on the vast majority of practical benchmarks as of 2026. The commercial restriction on AF3’s server is not a small technicality — it means you cannot use AF3-server-generated structures in patent filings without a Google DeepMind agreement. Boltz-1 (Apache 2.0) has no such restriction. For commercial programs, Boltz-1 is often the correct choice even if AF3 were marginally more accurate.
This is the single most common error in groups adopting AI-assisted discovery. Generate 10,000 molecules optimized purely for binding affinity, filter for ADMET at the end, and you will find that 90%+ of your “leads” are immediately disqualified. Chemistry42, MolMIM with QED optimization, and similar platforms apply ADMET constraints during each generative cycle — not as a post-filter. Set your ADMET parameters at the generation stage or accept that you are generating a lot of computationally expensive waste.
| Wrong Approach | Right Approach |
|---|---|
| Use AF3 structure directly in FEP without equilibration | Equilibrate with 50–100 ns MD simulation; confirm pocket geometry is stable before FEP |
| Stop the workflow after getting a predicted structure | Treat structure as Stage 1 input; proceed through docking → generation → ADMET → physics ranking |
| Order synthesis of top Chemistry42 candidates directly | Filter by SA Score <3.5 and ASKCOS route feasibility ≤4 steps before ordering |
| Default to AF3 server for commercial programs | Use Boltz-1 (Apache 2.0) for commercial work; reserve AF3 for academic feasibility studies |
| Generate 10,000 molecules, filter for ADMET at end | Embed ADMET constraints in Chemistry42 or MolMIM generation parameters from cycle 1 |
The compression of the early-stage timeline is real and measurable. What AI has not yet touched — and is unlikely to meaningfully compress in the next 12-18 months — is the fundamental biology of why drugs fail in Phase II and III. Roughly 40% of Phase II failures are attributed to lack of efficacy: the drug hit the target exactly as designed, but hitting that target did not produce the intended therapeutic effect. That failure mode is not a structure prediction problem or a generative chemistry problem. It is a target selection problem, driven by incomplete understanding of disease biology, and no amount of better ML closes that gap.
Within computational chemistry itself, three limitations are worth stating plainly. First: predicted structures are static, but biology is dynamic. Proteins breathe, flex, and undergo allosteric conformational changes on nanosecond-to-millisecond timescales that AlphaFold 3 cannot capture. Designing a drug against a single predicted conformation risks missing induced-fit binding modes or designing against a conformation the protein only occupies 5% of the time. Molecular dynamics is still required for any serious lead series. Second: the models are trained on publicly available data, which is heavily biased toward well-studied protein families. For genuinely novel targets — orphan receptors, intrinsically disordered proteins, protein-RNA interactions — the training distribution provides little guidance and accuracy drops sharply. Third: in vitro AI predictions still fail to capture complex in vivo pharmacokinetics involving drug-drug interactions, active transport, first-pass metabolism variation between individuals, and tissue-specific distribution. ADMET models predict in vitro properties; clinical ADME is a different problem.
AI in drug discovery compresses the first two to three years of a twelve-to-fifteen-year process. The tools above are transformative for hit identification and early lead optimization. They do not reduce Phase II efficacy failure rates, address dynamic protein conformations reliably, or replace in vivo pharmacokinetics experiments. Knowing which part of the problem you are actually solving avoids expensive misdirection.
The researchers getting the most out of these tools share one characteristic: they understand what each tool is actually computing, not just what it outputs. Running AlphaFold 3 and accepting the top-ranked model without reading the PAE matrix is like accepting a weather forecast without knowing if the forecast model is calibrated for your geography. The tools above are powerful precisely because they are honest about their uncertainty — pLDDT scores, confidence outputs, SA Score warnings. Learning to read those uncertainty signals is the core competency that separates productive AI-assisted discovery from expensive computational theater.
There is also a more fundamental point buried in this whole pipeline. The reason the best drug discovery AI — from DiffDock’s docking accuracy to Chemistry42’s iterative generative cycles — works as well as it does is that drug-target binding has more predictable physics than almost any other biological process. A small molecule either fits the pocket or it does not; the rules are relatively clean. The further you move from that physical core — toward cellular signaling, animal models, human clinical response — the messier the biology becomes and the less AI’s pattern-matching advantages apply. The tools in this article are best understood as physical chemistry accelerators, not universal drug discovery engines.
Human judgment remains essential at two junctures that no current AI handles well. The first is target selection: deciding which protein, in which disease context, is worth pursuing. AlphaFold 3 can tell you the shape of any protein on Earth; it cannot tell you which ones are worth drugging. That requires integrating genetics, phenotypic data, patient stratification, competitive landscape, and clinical feasibility considerations that remain firmly in the domain of experienced researchers. The second is the interpretation of animal study and early clinical data — where the gap between a computationally predicted lead and a therapeutic reality becomes visible, and where experienced clinical judgment about what the data actually means has no current computational substitute.
Where these tools are heading in the next 12-18 months: expect tighter integration between structure prediction and molecular dynamics (AF3-initialized MD as a standard upstream step), multi-target ADMET models trained on clinical outcome data rather than in vitro assays, and generative tools that co-optimize binding affinity, selectivity over related family members, and synthetic accessibility simultaneously in a single objective. Evo’s genomic foundation model also opens the door to designing the biological context of drug targets — not just the molecules that bind them — which is a meaningful expansion of the problem space. The pipeline above will look different in 2028. The underlying principle — that good experimental biology and careful computational chemistry work together — will not.
AlphaFold 3 runs free in your browser. Boltz-1 installs with one pip command. Start with your target sequence and build from there.
Editorial note: Tool capabilities, benchmark figures, and license terms described in this article reflect publicly available information as of Q1 2026. AlphaFold 3 server terms, Boltz-1 benchmarks, Chemistry42 pipeline details, ESM-3 model weights, and NVIDIA BioNeMo NIM availability were verified against official documentation and published preprints. Drug discovery timelines cited (ISM001-055 26-day design, Phase II entry 2023) are drawn from Insilico Medicine’s published disclosures. All tools require independent validation before use in regulated drug development programs. This article constitutes independent editorial commentary and is not affiliated with Google DeepMind, EvolutionaryScale, Insilico Medicine, NVIDIA, or Schrödinger Inc.