Computational methods for docking ligands to protein binding sites have become ubiquitous in drug discovery. Despite the age of the field, no standards have been established with respect to methodological evaluation of docking accuracy, virtual screening utility, or scoring accuracy. There are critical issues relating to data sharing, data set design and preparation, and statistical reporting that have an impact on the degree to which a report will translate into real-world performance. These issues also have an impact on whether there is a transparent relationship between methodological changes and reported performance improvements. This paper presents detailed examples of pitfalls in each area and makes recommendations as to best practices.