Predicting Binding affinity

The affinity of a ligand for the target protein is a crucial measurement in drug discovery, and whilst there are many techniques to measure binding affinity, prediction remains a significant challenge. Whilst hit-finding strategies may return high affinity hits many only yield modest affinity starting points, and fragment-based approaches almost certainly will only identify very weak ligands.

Whist medicinal chemists can by a mixture of design or trial and error optimise binding affinity the process is often inefficient, requires significant synthetic chemistry resources, and the cycle time between design and test can be very slow. One way to improve the situation would be to be able to accurately predict the binding affinity in silico.

Unfortunately prediction of binding affinity is extremely difficult, experimentally derived protein structures sometimes lack resolution and have a number of issues

–Alternates, Residues with alternate locations and/or ambiguous sequence identities (choose highest occupancy) but remember proteins are not static.

–Termini, Protein chain C- or N-termini which need to be charged or capped, or if DNA the terminal PO4 may only have three oxygens bonded to the phosphorous and an additional oxygen needs to be added. Sometimes loops are very disordered and appear as a breaks in the chain, it may be possible to use a loop library to model a replacement.

–Hydrogens, usually not visible and so need to be added/checked, particularly check hydrogens on heteroatoms, especially active site residues where the local environment may influence pKa

–Ligand, Novel ligands in particular need checking to confirm atoms and bond orders are correct (PDB files do not contain bond information)

–Conformation, check that torsions are reasonable and there are no clashes.

–Charge, It with worth checking the charge on all ionisable groups.

–It can be difficult to be certain of the position of nitrogens in His or the primary amide in Asn, Gln.

–The position of water molecules is often not visible.

AI-based Protein structure prediction and docking

Folding algorithms can provide a protein structure for cases where it has not been derived experimentally and there are now a number of options

AlphaFold https://www.nature.com/articles/s41586-019-1923-7
RoseTTAFold https://www.science.org/doi/10.1126/science.abj8754
EMSFold https://www.science.org/doi/10.1126/science.ade2574
OmegaFold https://hpc.nih.gov/apps/OmegaFold.html

You may not even have to generate a structure since the AlphaFold DB provides open access to over 200 million protein structure predictions

More recently these tools have developed into co-folding algorithms that incorporate ligands, however they do not include cofactors, waters etc. These tools have all benefitted from the fantastic resource that is the Protein Data Bank https://www.rcsb.org. The availability of high quality data is critical to further development and the OpenBind initiative hopes to use automated chemistry and high throughput X-ray crystallography to significantly increase the number of publicly available protein-ligand structures over a period of 5 years.

Boltz https://www.biorxiv.org/content/10.1101/2025.06.14.659707v1
AlphaFold 3 https://www.nature.com/articles/s41586-024-07487-w
Chai-1 https://www.biorxiv.org/content/10.1101/2024.10.10.615955v1
RoseTTAFold Allartom https://www.science.org/doi/10.1126/science.adl2528
Umol https://github.com/patrickbryant1/Umol
SurfDock https://github.com/CAODH/SurfDock

Physics based- Docking

There are a wide variety of docking tools available, a few are listed below.

AutoDock Vina : Commonly used, open-source tool for protein-ligand docking.
Schrödinger Glide: Commercial software as part of extensive software package.
GOLD (Genetic Optimization for Ligand Docking): Known for flexibility, including protein/ligand flexibility.
OpenEye Scientific Tools : Offers specialized tools for shape-based docking and rapid virtual screening.
DOCK: A widely used program for exploring binding modes.
MOE : Widely used commercial software as part of extensive software suite.
SMINA : Offshoot of Vina
SwissDock: Webserver for docking

However, we should always remember that proteins are not static structures despite the beautiful images in the PDB. Docking into a rigid protein structure that has a ligand bound into the active site has proved to be very successful in generating reasonable poses in a reasonably efficient manner. However, proteins are not static and one option is to allow amino-acid side-chains to move, this can improve results but does add an order of magnitude to the time taken. Adding conformational flexibility to the backbone further increases the computational cost.

Docking can of course be used for both screening very large numbers of compounds, but it can also be used to evaluate a single idea devised by a medicinal chemist. For one speed is a critical component, for the other accuracy is more important.

Scoring Functions

Whilst many of the docking tools score well when comparing the RMSD with experimental data, the binding affinity is much less well predicted. Docking scores are perhaps better regarded as confidence in a pose, bearing in mind it is possible to be very confident about the pose of a modest affinity compound. This is well summarised by described by Avery Sader

Docking can be useful for generating hypotheses of binding modes. But here’s the thing: docking cannot predict binding affinities… Binding affinity depends on the entire free energy landscape, not just a single docking pose. Protein dynamics and conformational entropy are huge factors. Kinetics and residence time also influence binding affinity, but are not accounted for in docking.

The scoring functions form docking tools fail to account for numerous physical effects: protein side-chains and backbone flexibility, waters and ions, alternate protonation states, strain and desolvation effects, the subtle non-covalent interactions between ligand atoms and protein side chains, water-mediated hydrogen bonding networks..

It is also worth noting that measuring binding affinity experimentally can be very challenging and can carry depending on the assay conditions. Measuring kinetic parameters (Kon, Koff) can be very useful especially when comparing PD and PK .

Whilst techniques like Free Energy Perturbation (FEP) can be used to predict binding affinities, the computational cost and time taken can be prohibitive.

Molecular Dynamics

Molecular dynamics (MD) simulates the physical movements of atoms to calculate the binding free energy between a protein and a ligand, offering a dynamic, accurate alternative to static docking. It determines binding affinity by sampling configurations. Proper sampling of the potential energy surface associated with the binding and unbinding of ligands from a protein is important.

MM-GBSA

Usually involves taking snapshots from Molecular Dynamics (MD) trajectories, calculating the gas-phase energy, and estimating the solvation free energy using the Generalised Born mode. Useful to rank docking poses, identify key binding residues, and analyse binding interactions. Less computationally demanding than FEP. https://pmc.ncbi.nlm.nih.gov/articles/PMC4487606/. Limitations include the implicit solvent approximation, and conformational sampling.

A few examples

AMBER MMPBSA.py
Schrödinger Prime MM-GBSA
gmx_MMPBSA (GROMACS)
xTB

Absolute Free Energy Perturbation

Absolute Binding Free Energy (ABFE) Perturbation calculates the binding strength of a ligand into a protein target.

Possibly most useful in the early stage of ligand identification, identifying key interactions, opportunities for hit expansion and comparing different structural classes. However, it is computationally more challenging.

Schrödinger FEP+
Cresset Flare FEP
OpenFE
NAMD

Relative binding free-energy perturbation (RBFE FEP)

Relative binding free-energy methods compute the change in binding free energy between two ligands by “alchemically” transforming one ligand into another in both solvent and the protein complex. The output is ΔΔG between ligands, so if the starting point is a known ligand with experimentally determines binding affinity we can predict the binding affinity for novel ligands (usually you would first benchmark by predicting the binding affinity for a series of known ligands).

RBFE FEP is an ideal tool for the ligand optimisation stage of a project, for ranking potential ligand ideas prior to synthesis.

Schrödinger FEP+
Cresset Flare FEP
OpenFE
pmx (GROMACS-based)
NAMD FEP

AI/ML-based scoring functions

Whilst physics-based binding affinity predictions have been very useful, they are computationally very costly, and machine learning approaches offer the potential for fast and accurate binding affinity predictions. In addition the increasing quantities of both binding affinity measurements and high-resolution structural data make AI/ML more attractive. RFscore was one of the early attempts to apply Random Forest to predicting binding affinity [DOI]. Gnina uses a convolutional neural network (CNN) scoring functions that work on an atomic density grid representation of the complex [DOI]. As might be expected AI/ML do suffer from poorer performance on out-of-distribution (OOD) datasets, hopefully more data is being generated OpenBind https://openbind.uk .

AEV-PLIG combines atomic environment vectors (AEVs) with protein–ligand interaction graphs (PLIGs) to learn the relative importance of neighbouring environments [DOI]. In an extensive comparison with FEP, AEV-PLIG gave comparable performance for many targets. Code is available on GitHub https://github.com/oxpig/AEV-PLIG.

As mentioned above co-folding methods also offer binding affinity calculations but are still being optimised.

RFscore
gnina
AEV-PLIG
Boltz-2
DiffDock

Worth Reading

The Affinity Advantage, Mark Murcko https://www.preprints.org/manuscript/202511.1564

AlphaFold Protein Structure Database and 3D-Beacons: New Data and Capabilities [DOI]

Scoring Functions for Protein-Ligand Binding Affinity Prediction Using Structure-based Deep Learning: A Review [DOI]

Accurate and Reliable Prediction of Relative Ligand Binding Potency in Prospective Drug Discovery by Way of a Modern Free-Energy Calculation Protocol and Force Field [DOI]

Best Practices for Alchemical Free Energy Calculations [DOI]

Cambridge MedChem Consulting

Navigation