GPU algorithms for Efficient Exascale Discretizations

Parallel Computing - Tập 108 - Trang 102841 - 2021
Ahmad Abdelfattah1, Valeria Barra2, Natalie Beams1, Ryan Bleile3, Jed Brown2, Jean-Sylvain Camier4, Robert Carson5, Noel Chalmers6, Veselin Dobrev4, Yohann Dudouit4, Paul Fischer7,8,9, Ali Karakus10, Stefan Kerkemeier7, Tzanio Kolev4, Yu-Hsiang Lan7, Elia Merzari7,11, Misun Min7, Malachi Phillips8, Thilina Rathnayake8, Robert Rieben3
1Innovative Computing Laboratory, University of Tennessee, Knoxville, TN 37996, United States of America
2Department of Computer Science, University of Colorado, Boulder, CO 80309, United States of America
3Weapons and Complex Integration, Lawrence Livermore National Laboratory, Livermore, CA 94550, United States of America
4Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, Livermore, CA 94550, United States of America
5Computational Engineering Division, Lawrence Livermore National Laboratory, Livermore, CA 94550, United States of America
6AMD Research, Advanced Micro Devices Inc., Austin, TX 78735, United States of America
7Mathematics and Computer Science, Argonne National Laboratory, Lemont, IL 60439, United States of America
8Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801, United States of America
9Department of Mechanical Science and Engineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, United States of America
10Mechanical Engineering Department, Middle East Technical University, Ankara 06800, Turkey
11Department of Nuclear Engineering, Penn State, PA 16802, United States of America

Tài liệu tham khảo

Center for Efficient Exascale Discretizations, Exascale Computing Project, DOE, ceed.exascaleproject.org. Kolev, 2021, Efficient exascale discretizations: High-order finite element methods, Int. J. HPC App., 1 Kreiss, 1972, Comparison of accurate methods for the integration of hyperbolic problems, Tellus, 24, 199, 10.3402/tellusa.v24i3.10634 Babuška, 1994, The p and h−p versions of the finite element method, basic principles and properties, SIAM Rev., 36, 578, 10.1137/1036141 Orszag, 1980, Spectral methods for problems in complex geometry, J. Comput. Phys., 37, 70, 10.1016/0021-9991(80)90005-4 Gottlieb, 1977 Arndt, 2020, Exadg: High-order discontinuous Galerkin for the exa-scale, 189 Bello-Maldonado, 2019, Scalable low-order finite element preconditioners for high-order spectral element Poisson solvers, SIAM J. Sci. Comput., 41, S2, 10.1137/18M1194997 Canuto, 2010, Finite-element preconditioning of g-NI spectral methods, SIAM J. Sci. Comput., 31, 4422, 10.1137/090746367 Moxey, 2020, Efficient matrix-free high-order finite element evaluation for simplicial elements, SIAM J. Sci. Comput., 42, C97, 10.1137/19M1246523 Sun, 2020, A study of vectorization for matrix-free finite element methods, Int. J. High Perform. Comput. Appl., 34, 629, 10.1177/1094342020945005 Anderson, 2020, MFEM: A modular finite element library, Comput. Math. Appl. Kronbichler, 2018, A performance comparison of continuous and discontinuous Galerkin methods with fast multigrid solvers, SIAM J. Sci. Comput., 40, A3423, 10.1137/16M110455X Kronbichler, 2019, Multigrid for matrix-free high-order finite element computations on graphics processors, ACM Trans. Parallel Comput., 6, 1, 10.1145/3322813 Lottes, 2005, Hybrid multigrid/Schwarz algorithms for the spectral element method, J. Sci. Comput., 24, 45, 10.1007/s10915-004-4787-3 Fischer, 2020, Scalability of high-performance PDE solvers, Int. J. HPC App., 34, 562 Brown, 2021, libCEED: Fast algebra for high-order element-based discretizations, J. Open Source Softw., 6, 2945, 10.21105/joss.02945 Abdelfattah, 2021 Karniadakis, 2005 Vos, 2010, From h to p efficiently: Implementing finite and spectral/hp element methods to achieve optimal performance for low-and high-order discretisations, J. Comput. Phys., 229, 5161, 10.1016/j.jcp.2010.03.031 Ainsworth, 2011, Bernstein-Bézier Finite elements of arbitrary order and optimal assembly procedures, SIAM J. Sci. Comput., 33, 3087, 10.1137/11082539X Kirby, 2011, Fast simplicial finite element algorithms using Bernstein polynomials, Numer. Math., 117, 631, 10.1007/s00211-010-0327-2 Swirydowicz, 2019, Acceleration of tensor-product operations for high-order finite element methods, Int. J. High Perform. Comput. Appl., 33, 735, 10.1177/1094342018816368 Medina, 2014 MAGMA: Matrix Algebra on GPU and Multicore Architectures, icl.utk.edu/magma. Abdelfattah, 2016, High-performance tensor contractions for GPUs, 108 N. Beams, A. Abdelfattah, S. Tomov, J. Dongarra, T. Kolev, Y. Dudouit, High-Order Finite Element Method using Standard and Device-Level Batch GEMM on GPUs, in: 11th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems,Proceedings. To Appear, 2020. Hornung, 2014 Chalmers, 2020 Medina, 2015 Chalmers, 2020 N. Chalmers, T. Warburton, streamParanumal: Streaming Microbenchmarks for High-order Finite Element Methods, URL github.com/paranumal/streamparanumal. 2020 Fischer, 2021 Melander, 2020 2020 2020 P.F. Fischer, K. Heisey, M. Min, Scaling limits for PDE-based simulation, in: 22nd AIAA Computational Fluid Dynamics Conference, 2015, p. 3049. Deville, 2002 Otten, 2016, An MPI/OpenACC implementation of a high order electromagnetics solver with GPUDirect communication, Int. J. High Perform. Comput. Appl., 30, 320, 10.1177/1094342015626584 Gong, 2016, Nekbone performance on GPUs with OpenACC and CUDA fortran implementations, special issue on sustainability on ultrascale computing systems and applications, J. Supercomput., 72, 4160, 10.1007/s11227-016-1744-5 Otero, 2019, OpenACC Acceleration for the PN−PN−2 algorithm in Nek5000, J. Parallel Distrib. Comput., 132, 69, 10.1016/j.jpdc.2019.05.010 Fischer, 1998, Projection techniques for iterative solution of Ax̲=b̲ with successive right-hand sides, Comput. Methods Appl. Mech. Engrg., 163, 193, 10.1016/S0045-7825(98)00012-7 Austin, 2020 2020 Y.-H. Lan, P. Fischer, E. Merzari, M. Min, All-hex meshing strategies for densely packed spheres, in: The 29th International Meshing Roundtable, 2021. Anderson, 2018, High-order multi-material ALE hydrodynamics, SIAM J. Sci. Comput., 40, B32, 10.1137/17M1116453 Beckingsale, 2019, Umpire: Application-focused management and coordination of complex hierarchical memory, IBM J. Res. Dev., 1 Dobrev, 2012, High-order curvilinear finite element methods for Lagrangian hydrodynamics, SIAM J. Sci. Comput., 34, B606, 10.1137/120864672 Dobrev, 2016, Multi-material closure model for high-order finite element Lagrangian hydrodynamics, Internat. J. Numer. Methods Engrg., 82, 689, 10.1002/fld.4236 2020 Bello-Maldonado, 2020, A matrix-free hyperviscosity formulation for high-order ALE hydrodynamics, Comput. Fluids, 10.1016/j.compfluid.2020.104577 Dobrev, 2019, The target-matrix optimization paradigm for high-order meshes, SIAM J. Sci. Comput., 41, B50, 10.1137/18M1167206 Dobrev, 2020, Simulation-driven optimization of high-order meshes in ALE hydrodynamics, Comput. Fluids, 208, 10.1016/j.compfluid.2020.104602 Anderson, 2015, Monotonicity in high-order curvilinear finite element arbitrary Lagrangian–Eulerian remap, Internat. J. Numer. Methods Engrg., 77, 249, 10.1002/fld.3965 Anderson, 2017, High-order local maximum principle preserving (MPP) discontinuous Galerkin finite element method for the transport equation, J. Comput. Phys., 334, 102, 10.1016/j.jcp.2016.12.031 Hajduk, 2020, Matrix-free subcell residual distribution for Bernstein finite element discretizations of linear advection equations, Comput. Methods Appl. Mech. Engrg., 359, 10.1016/j.cma.2019.112658 Carson, 2019 Barton, 2018 Gupta, 1972, A method of computing numerically integrated stiffness matrices, Internat. J. Numer. Methods Engrg., 5, 83, 10.1002/nme.1620050108 Gupta, 1983, Efficient numerical integration of element stiffness matrices, Internat. J. Numer. Methods Engrg., 19, 1410, 10.1002/nme.1620190910