GPU computing

Rio Yokota, L Barba and Tsuyoshi Hamada, posing next to Degima, the do-it-yourself GPU cluster in Nagasaki, 2010

In the past few years, computational science has seen a paradigm shift in hardware architectures. The IT industry, faced with a number of bottlenecks (memory, power, complexity), opted for on-chip parallelism and thus further increases in performance for simulation science will require parallel computing.

A compelling new trend is using graphics processors (GPUs) for scientific computing, perhaps the most exciting development since the debut of the Beowulf cluster in the ‘90s.

With this opportunity, however, comes the challenge of adapting our large toolbox of algorithms to the changes in computer architecture. There is continuing need for research into algorithms that exploit the new hardware, and we have been involved in this area since 2007.

See the screencast of Prof. Barba's talk at the GPU Technology Conference, September 2010.

References

"GPU@BU—GPU computing at Boston University", L. A. Barba. (29 November 2012). 10.6084/m9.figshare.98875
A short history of GPU computing at Boston University, made into a handout for the BU booth at the Supercomputing Conference, Salt Lake City, November 2012. Published on figshare under CC-BY.
Invited: "The triad of extreme computing—fast algorithms, open software and heterogeneous systems", L. A. Barba. GPU Technology Conference organized by NVIDIA (20–23 September 2010), San Jose, CA.
"cuIBM—A GPU accelerated immersed boundary method", Simon K. Layton, Anush Krishnan, L. A. Barba. 23rd International Conference on Parallel Computational Fluid Dynamics, ParCFD’11 (16–20 May 2011), Barcelona, Spain. // Code repository //
"Treecode and fast multipole method for N-body simulation with CUDA", Rio Yokota, L. A. Barba. (2011). 10.1016/B978-0-12-384988-5.00009-7 // Preprint arXiv:1010.1482
Ch. 9 in GPU Computing Gems Emerald Edition, Wen-mei Hwu, ed.; Morgan Kaufmann/Elsevier (2011) pp. 113–132. ISBN: 978-0-12-384988-5
"Biomolecular electrostatics using a fast multipole BEM on up to 512 GPUs and a billion unknowns", Rio Yokota, J. P. Bardhan, M. G. Knepley, L. A. Barba, T. Hamada. Comput. Phys. Commun., 182(6):1271–1283 (June 2011). 10.1016/j.cpc.2011.02.013 // Preprint arXiv:1007.4591
"How to obtain efficient GPU kernels: an illustration using FMM & FGT", Felipe A. Cruz, Simon K. Layton, L. A. Barba. Comput. Phys. Commun., 182(10):2084–2098 (October 2011). 10.1016/j.cpc.2011.05.002 // Preprint arXiv:1009.3457