Binding affinity prediction is frequently addressed using computational models constructed solely with molecular structure and activity data. We present a hybrid structure-guided strategy that combines molecular similarity, docking, and multiple-instance learning such that information from protein structures can be used to inform models of structure–activity relationships. The Surflex-QMOD approach has been shown to produce accurate predictions of binding affinity by constructing an interpretable physical model of a binding site with no experimental binding site structural information.
We introduce a method to integrate protein structure information into the model induction process in order to construct more robust physical models. The structure-guided models accurately predict binding affinities over a broad range of compounds while producing more accurate representations of the protein pockets and ligand binding modes. Structure-guidance for the QMOD method yielded significant performance improvements, both for affinity and pose prediction, especially in cases where predictions were made on ligands very different from those used for model induction.