12 Ways to Fool the Masses with Irreproducible Results
Keynote at the IEEE International Parallel and Distributed Processing Symposium, May 19, 2021
Thirty years ago, David Bailey published a humorous piece in the Supercomputing Review magazine, listing 12 ways of presenting results to artificially boost performance claims. That was at a time when the debate was between Cray "two-oxen" machines versus parallel "thousand-chickens" systems, when parallel standards (like MPI) were still unavailable, and the Top500 list didn't yet exist. In the years since, David and others updated the list of tricks a few times, notably in 2010–11 (when the marketing departments of Intel and Nvidia were really going at each other) Georg Hager in his blog and Scott Pakin in HPC Wire. Heterogeneity of computing systems has only escalated in the last decade, and many remiss reporting tactics continue unabated. Alas, two new ingredients have entered into the mix: wide adoption of machine learning techniques both in the science applications and systems research; and a swell of concern over reproducibility and replicability. My talk will be a new twist on the 12 ways to fool the masses, focusing on how researchers in computational science and high-performance computing miss the mark when conducting or reporting their results with poor reproducibility. By showcasing in a lighthearted manner a set of anti-patterns, I aim to encourage us to see the value and commit to adapting our practice to achieve more trustworthy scientific evidence with high-performance computing.