Trustworthy computational evidence through transparency and reproducibility
Invited talk at SC20
November 17, 2020
Many high-performance computing applications are of high consequence to society. Global climate modeling is an historic example of this. In 2020, the societal issue of greatest concern, the still-raging COVID-19 pandemic, saw a legion of computational scientists turn their endeavors to new research projects in this direction. Applications of such high consequence highlight the requirement of building trustworthy computational models. Emphasizing transparency and reproducibility have helped us build more trust in computational findings. In the context of supercomputing, however, we may ask: how do we trust results from computations that cannot be repeated? Access to supercomputers is limited, allocations are finite and machines are decommissioned after a few years. I had the distinction to serve as SC19 Reproducibility Chair, and contribute to the strengthening of this initiative for SC. I was also a member of the National Academies study Committee on Replicability and Reproducibility in Science, which released its report last year. There, reproducibility is defined as "obtaining consistent computational results using the same input data, computational steps, methods, code and conditions of analysis." We should ask how this can be ensured, certified even, without exercising the original digital artifacts. This is often the situation in HPC. It is compounded now with greater adoption of machine learning techniques, which can be opaque. The ACM in 2017 issued the Statement on Algorithmic Transparency and Accountability, targeting algorithmic decision-making using data models. Among its seven principles, it calls for data provenance, auditability, validation and testing. These principles can be applied not only to data models, but to HPC in general. In this talk, I want to discuss the next steps for reproducibility: how we may adapt our practice to achieve what I call unimpeachable provenance, and achieve full auditability and accountability of scientific evidence produced via computation.