Automatic equilibration detection manuscript on bioRxiv

In molecular simulations---especially simulations of complex systems like biomolecules---it's incredibly difficult to start the simulation close enough to equilibrium to avoid initial transients in properties of interest.  As a result, it is almost universally recommended that some initial portion of the simulation be discarded to "equilibration".  Unfortunately, there hasn't been a simple, automated, and generally applicable way to do this that is standard practice in the field.

In a new manuscript draft posted to bioRxiv this morning, I show how an amazingly simple approach---simply maximizing the number of statistically uncorrelated samples in the latter part of the simulation---can lead to a surprisingly robust and useful algorithm for equilibration detection.  This is very much a work in progress, so comments and feedback is very much appreciated!

DOI: http://dx.doi.org/10.1101/021659

All code needed to grab the exact versions of the tools I used (using the conda package installer and the omnia molecular simulation suite), generate the simulation data, analyze it, and generate the figures for the paper is available on GitHub: You simply need to run

./reproduce.sh

to regenerate everything---which is exactly what I did to generate the figures in the posted version of the manuscript.  There are still a few improvements I hope to make the scripts easier to read and the data easier to deal with, but hopefully we can try to attain this level of ultra-simple reproducibility in future work as well.

Update [5 July 2015]: The manuscript has been updated based on valuable feedback I've already received!  Thanks to everyone who has made comments!