SAMPL7 protein-ligand challenge: A community-wide evaluation of computational methods against fragment screening and pose-prediction

Grosjean H, Isik M, Aimon A, Mobley D, Chodera JD, von Delft F, and Biggin PC
Journal of Computer-Aided Molecular Design 36:291, 2022 [DOI]

We field a blind community challenge to assess how well state of the art computational chemistry methods can predict the binding modes of small druglike fragments to a protein target for which no chemical matter is known, PHIP2, using fragment screening at the Diamond Light Source.

Overview of the SAMPL6 pKa challenge: evaluating small molecule microscopic and macroscopic pKa predictions

Mehtap Işık, Ariën S Rustenburg, Andrea Rizzi, Marilyn R Gunner, David L Mobley, John D Chodera
Journal of Computer-Aided Molecular Design 35:131, 2021
[DOI] [bioRxiv] [GitHub] [manuscript and figure sources]

The SAMPL6 pKa challenge assessed the ability of the computational chemistry community to predict macroscopic and microscopic pKas for a set of druglike molecules resembling kinase inhibitors. This paper reports on the overall performance and lessons learned, including the surprising finding that many tools predict reasonably accurate macroscopic pKas corresponding to the wrong microscopic protonation sites.

Standard state free energies, not pKas, are ideal for describing small molecule protonation and tautomeric states

M R Gunner, Taichi Murakami, Ariën S. Rustenburg, Mehtap Işık, and John D. Chodera.
Journal of Computer Aided Molecular Design 34:561, 2020. [DOI] [PDF] [GitHub]

Here, we demonstrate how the physical nature of protonation and tautomeric state effects means that the standard state free energies of each microscopic protonation/tautomeric state at a single pH is sufficient to describe the complete pH-dependent microscopic and macroscopic populations. We introduce a new kind of diagram that uses this concept to illustrate a variety of pH-dependent phenomena, and show how it can be used to identify common issues with protonation state prediction algorithms. As a result, we recommend future blind prediction challenges utilize microstate free energies at a single reference pH as the minimal sufficient information for assessing prediction accuracy and utility.

Measuring experimental cyclohexane-water distribution coefficients for the SAMPL5 challenge

Ariën S. Rustenburg, Justin Dancer, Baiwei Lin, Jianweng A. Feng, Daniel F. Ortwine, David L. Mobley, and John D. Chodera.
Journal of Computer-Aided Molecular Design 30:945, 2016. [DOI] [bioRxiv] [PDF] // data: [GitHub]
Solicited manuscript for special issue of the Journal of Computer Aided Molecular Design on the SAMPL5 Challenge.

The SAMPL Challenges have driven predictive physical modeling for ligand:protein binding forward by focusing the community on a series of blind challenges that evaluate performance on blind datasets, focus attention on current challenges for physical modeling techniques, and provide high-quality experimental datasets to the community after the challenge is over. For many years, challenges focused around hydration free energies have proven to be extremely useful, with theory now able to determine when experiment is wrong. To replace these challenges, since no more hydration free energy data is being measured, we proposed to use the partition or distribution coefficients of small druglike molecules between aqueous and apolar phases. We report the collection of cyclohexane-water partition data for a set of compounds used to drive the SAMPL5 distribution coefficient challenge, providing the experimental data, methodology, and insight for future iterations of this challenge.

Modeling error in experimental assays using the bootstrap principle: Understanding discrepancies between assays using different dispensing technologies

Sonya M. Hanson, Sean Ekins, and John D. Chodera.
Journal of Computer Aided Molecular Design 29:1073, 2015. [DOI] [PDF] // IPython notebook [GitHub] // preprint: [bioRxiv]
Inspired by this In the Pipeline blog post

The drug development community faced a puzzling challenge when a disturbing paper published in PLoS One demonstrated results from the same assay performed with different dispensing technologies both varied wildly and significantly different in magnitude of reported potencies. Inspired by a talk given at the 2014 CADD GRC by Cosma Shalizi on bootstrapping to model error, we show how this simple idea can help explain a large amount of the discrepancy in this assay, and provide simple mathematical tools and an IPython notebook illustrating how easy it is to model the error and bias in experimental assays even when other information about assay reliability is unavailable.