PKDGRAV3: beyond trillion particle cosmological simulations for the next era of galaxy surveys

Springer Science and Business Media LLC - Tập 4 - Trang 1-13 - 2017
Douglas Potter1, Joachim Stadel1, Romain Teyssier1
1University of Zurich, Zurich, Switzerland

Tóm tắt

We report on the successful completion of a 2 trillion particle cosmological simulation to $z=0$ run on the Piz Daint supercomputer (CSCS, Switzerland), using 4000+ GPU nodes for a little less than 80 h of wall-clock time or 350,000 node hours. Using multiple benchmarks and performance measurements on the US Oak Ridge National Laboratory Titan supercomputer, we demonstrate that our code PKDGRAV3, delivers, to our knowledge, the fastest time-to-solution for large-scale cosmological N-body simulations. This was made possible by using the Fast Multipole Method in conjunction with individual and adaptive particle time steps, both deployed efficiently (and for the first time) on supercomputers with GPU-accelerated nodes. The very low memory footprint of PKDGRAV3 allowed us to run the first ever benchmark with 8 trillion particles on Titan, and to achieve perfect scaling up to 18,000 nodes and a peak performance of 10 Pflops.

Tài liệu tham khảo

Ade, PAR, et al.: Planck 2013 results. XVI. Cosmological parameters. Astron. Astrophys. 571, 16 (2014). doi:10.1051/0004-6361/201321591; arXiv:1303.5076 Alimi, JM, Bouillot, V, Rasera, Y, Reverdy, V, Corasaniti, PS, Balmes, I, Requena, S, Delaruelle, X, Richet, JN: DEUS full observable LCDM universe simulation: the numerical challenge (2012). arXiv:1206.2838 Angulo, RE, Springel, V, White, SDM, Jenkins, A, Baugh, CM, Frenk, CS: Scaling relations for galaxy clusters in the Millennium-XXL simulation. Mon. Not. R. Astron. Soc. 426(3), 2046-2062 (2012). doi:10.1111/j.1365-2966.2012.21830.x; arXiv:1203.3216 Barnes, J, Hut, P: A hierarchical O(N log N) force-calculation algorithm. Nature 324(6096), 446-449 (1986). doi:10.1038/324446a0 Bédorf, J, Gaburov, E, Portegies Zwart, S: Bonsai: a GPU tree-code. In: Capuzzo-Dolcetta, R, Limongi, M, Tornambè, A (eds.) Advances in Computational Astrophysics: Methods, Tools, and Outcome. Astronomical Society of the Pacific Conference Series, vol. 453, p. 325 (2012). arXiv:1204.2280 Bédorf, J, Gaburov, E, Fujii, MS, Nitadori, K, Ishiyama, T, Portegies Zwart, S: 24.77 Pflops on a gravitational tree-code to simulate the Milky Way Galaxy with 18600 GPUs In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 54-65 (2014). doi:10.1109/SC.2014.10; arXiv:1412.0659 Brandt, A: Multi-level adaptive technique (MLAT) for fast numerical solution to boundary value problems. In: Proceedings of the Third International Conference on Numerical Methods in Fluid Mechanics, vol. 19, pp. 82-89. Springer, Berlin (1973). doi:10.1007/BFb0118663 Couchman, HMP, Thomas, PA, Pearce, FR: Hydra: an adaptive-mesh implementation of PPPM-SPH. Astrophys. J. 452, 797 (1994). doi:10.1086/176348; arXiv:astro-ph/9409058 Dehnen, W: A hierarchical \(\mathcal{O}(N)\) force calculation algorithm. J. Comput. Phys. 179(1), 27-42 (2002). doi:10.1006/jcph.2002.7026; arXiv:astro-ph/0202512 Dehnen, W, Read, JI: N-Body simulations of gravitational dynamics. Eur. Phys. J. Plus 126(5), 55 (2011). doi:10.1140/epjp/i2011-11055-3 arXiv:1105.1082 Fosalba, P, Gaztanaga, E, Castander, FJ, Crocce, M: The MICE Grand Challenge Lightcone Simulation III: galaxy lensing mocks from all-sky lensing maps. Mon. Not. R. Astron. Soc. 447(2), 1319-1332 (2013). doi:10.1093/mnras/stu2464; arXiv:1312.2947 Greengard, L, Rokhlin, V: A fast algorithm for particle simulations. J. Comput. Phys. 73(2), 325-348 (1987). doi:10.1016/0021-9991(87)90140-9 Habib, S, Morozov, V, Frontiere, N, Finkel, H, Pope, A, Heitmann, K: HACC. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC ’13, pp. 1-10. ACM, New (2013). doi:10.1145/2503210.2504566 Habib, S, Pope, A, Finkel, H, Frontiere, N, Heitmann, K, Daniel, D, Fasel, P, Morozov, V, Zagaris, G, Peterka, T, Vishwanath, V, Lukic, Z, Sehrish, S, Liao, WK: HACC: simulating sky surveys on state-of-the-art supercomputing architectures. New Astron. 42, 49-65 (2014). doi:10.1016/j.newast.2015.06.003; arXiv:1410.2805 Heitmann, K, Frontiere, N, Sewell, C, Habib, S, Pope, A, Finkel, H, Rizzi, S, Insley, J, Bhattacharya, S: The Q continuum simulation: harnessing the power of GPU accelerated supercomputers. Astrophys. J. Suppl. Ser. 219(2), 34 (2014). doi:10.1088/0067-0049/219/2/34; arXiv:1411.3396 Hernquist, L, Bouchet, FR, Suto, Y: Application of the Ewald method to cosmological N-body simulations. Astrophys. J. Suppl. Ser. 75, 231 (1991). doi:10.1086/191530 Hockney, RW, Eastwood, JW: Computer Simulation Using Particles. Hilger, Bristol (1988). Holmberg, E: On the clustering tendencies among the nebulae. II. A study of encounters between laboratory models of stellar systems by a new integration procedure. Astrophys. J. 94, 385 (1941). doi:10.1086/144344 Ishiyama, T, Fukushige, T, Makino, J: GreeM : massively parallel TreePM code for large cosmological N-body simulations. Publ. Astron. Soc. Jpn. 61(6), 1319-1330 (2009). doi:10.1093/pasj/61.6.1319; arXiv:0910.0121 Ishiyama, T, Nitadori, K, Makino, J: 4.45 Pflops astrophysical N-body simulation on K computer - the gravitational trillion-body problem. In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC (2012). doi:10.1109/SC.2012.3; arXiv:1211.4406 Klessen, R: GRAPESPH with fully periodic boundaries: fragmentation of molecular clouds. In: Clarke, DA, West, MJ (eds.) Computational Astrophysics: 12th Kingston Meeting on Theoretical Astrophysics. Astronomical Society of the Pacific Conference Series, vol. 123, p. 169 (1997) Laureijs, R, et al.: Euclid definition study report (2011). arXiv:1110.3193 LSST Science Collaboration, et al.: LSST Science Book, Version 2.0 (2009). arXiv:0912.0201 Peebles, PJE: Structure of the coma cluster of galaxies. Astron. J. 75, 13 (1970). doi:10.1086/110933 Reed, DS, Smith, RE, Potter, D, Schneider, A, Stadel, J, Moore, B: Toward an accurate mass function for precision cosmology. Mon. Not. R. Astron. Soc. 431(2), 1866-1882 (2012). doi:10.1093/mnras/stt301; arXiv:1206.5302 Schneider, A, Teyssier, R, Potter, D, Stadel, J, Onions, J, Reed, DS, Smith, RE, Springel, V, Pearce, FR, Scoccimarro, R: Matter power spectrum and the challenge of percent accuracy. J. Cosmol. Astropart. Phys. 2016(4), 047 (2016). doi:10.1088/1475-7516/2016/04/047; arXiv:1503.05920 Skillman, SW, Warren, MS, Turk, MJ, Wechsler, RH, Holz, DE, Sutter, PM: Dark sky simulations: early data release (2014). arXiv:1407.2600 Spergel, DN, Verde, L, Peiris, HV, Komatsu, E, Nolta, MR, Bennett, CL, Halpern, M, Hinshaw, G, Jarosik, N, Kogut, A, Limon, M, Meyer, SS, Page, L, Tucker, GS, Weiland, JL, Wollack, E, Wright, EL: First year Wilkinson microwave anisotropy probe (WMAP) observations: determination of cosmological parameters. Astrophys. J. Suppl. Ser. 148 175-194 (2003). doi:10.1086/377226; arXiv:astro-ph/0302209 Spergel, D, et al.: Wide-field infrared survey telescope-astrophysics focused telescope assets WFIRST-AFTA final report (2013). arXiv:1305.5422 Springel, V: The cosmological simulation code GADGET-2. Mon. Not. R. Astron. Soc. 364(4), 1105-1134 (2005). doi:10.1111/j.1365-2966.2005.09655.x; arXiv:astro-ph/0505010 Stadel, JG: Cosmological N-body simulations and their analysis. PhD thesis, University of Washington (2001) Teyssier, R: Cosmological hydrodynamics with adaptive mesh refinement: a new high resolution code called RAMSES. Astron. Astrophys. 385(1), 337-364 (2001). doi:10.1051/0004-6361:20011817; arXiv:astro-ph/0111367 Warren, MS, Salmon, JK: A parallel hashed oct-tree N-body algorithm. In: Proceedings of the 1993 ACM/IEEE Conference on Supercomputing - Supercomputing ’93, pp. 12-21. ACM, New York (1993). doi:10.1145/169627.169640 Warren, MS: 2HOT. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC ’13, pp. 1-12. ACM, New York (2013). doi:10.1145/2503210.2503220; arXiv:1310.4502 Warren, MS, Goda, MP, Becker, DJ, Salmon, JK, Winckelmans, GS, Sterling, T: Pentium Pro inside: I. A treecode at 430 Gigaflops on ASCI Red, II. Price/performance of $50/Mflop on Loki and Hyglac. In: Proceedings of Supercomputing ’97 (1997)