The Why and How of Reproducible Computational Research
Seminar at IACS Stony Brook University, 21 Sept. 2017.
Reproducible research hit the mainstream in the last few years, after more than two decades of back-alley campaigns. Feature articles have glossed the pages and covers of not only the most prominent science publications, but also the news media. Yet two crucial discussions are seldom clearly captured: Why (we care) and How (to do it). The why is today more important than ever: the success and credibility of science are on the line. Perverse incentives in academia interrupt our best intentions, so let’s agree on where our responsibilities lie with reproducibility. The how of reproducible research, on the other hand, can be surprisingly contentious. Should we use spreadsheets (no), point-and-click GUIs (it depends), or version control (yes)? What are the commitments we need to make—whether student, mentor, author, reader, funder? Often the focus of the conversation has been on open data and code. In my group, we have been practicing open science for years, but we found the hard way that open code is merely a first step. We need to exhaustively document our computational research, to encourage and accept publication of negative results, and to apply defensive tactics against bad code: version control, modular code, testing, and code review. The tools and methods require training, but if you establish the why, the how will follow.