Fast Matrix-Free Evaluation of Discontinuous Galerkin Finite Element Operators

ACM Transactions on Mathematical Software - Tập 45 Số 3 - Trang 1-40 - 2019
Martin Kronbichler1, Katharina Kormann2
1Technical University of Munich, Germany
2Max Planck Institute for Plasma Physics and Technical University of Munich, Germany

Tóm tắt

We present an algorithmic framework for matrix-free evaluation of discontinuous Galerkin finite element operators. It relies on fast quadrature with sum factorization on quadrilateral and hexahedral meshes, targeting general weak forms of linear and nonlinear partial differential equations. Different algorithms and data structures are compared in an in-depth performance analysis. The implementations of the local integrals are optimized by vectorization over several cells and faces and an even-odd decomposition of the one-dimensional interpolations. Up to 60% of the arithmetic peak on Intel Haswell, Broadwell, and Knights Landing processors is reached when running from caches and up to 40% of peak when also considering the access to vectors from main memory. On 2×14 Broadwell cores, the throughput is up to 2.2 billion unknowns per second for the 3D Laplacian and up to 4 billion unknowns per second for the 3D advection on affine geometries, close to a simple copy operation at 4.7 billion unknowns per second. Our experiments show that MPI ghost exchange has a considerable impact on performance and we present strategies to mitigate this effect. Finally, various options for evaluating geometry terms and their performance are discussed. Our implementations are publicly available through the deal.II finite element library.

Từ khóa


Tài liệu tham khảo

10.1177/1094342017694427

10.1145/3061708

10.1515/jnma-2018-0054

Robert Anderson , Andrew Barker , Jamie Bramwell , Jakub Cerveny , Johann Dahm , Veselin Dobrev , Yohann Dudouit , Aaron Fisher , Tzanio Kolev , Mark Stowell , and Vladimir Tomov . 2018 . MFEM: Modular finite element methods. mfem.org. Robert Anderson, Andrew Barker, Jamie Bramwell, Jakub Cerveny, Johann Dahm, Veselin Dobrev, Yohann Dudouit, Aaron Fisher, Tzanio Kolev, Mark Stowell, and Vladimir Tomov. 2018. MFEM: Modular finite element methods. mfem.org.

10.1137/S0036142901384162

Satish Balay , Shrirang Abhyankar , Mark F. Adams , Jed Brown , Peter Brune , Kris Buschelman , Lisandro Dalcin , Victor Eijkhout , William D. Gropp , Dinesh Kaushik , Matthew G. Knepley , Lois Curfman McInnes , Karl Rupp, Barry F. Smith, Stefano Zampini, Hong Zhang, and Hong Zhang. 2016 . PETSc Users Manual. Technical Report ANL-95/11 - Revision 3.7. Argonne National Laboratory . http://www.mcs.anl.gov/petsc. Satish Balay, Shrirang Abhyankar, Mark F. Adams, Jed Brown, Peter Brune, Kris Buschelman, Lisandro Dalcin, Victor Eijkhout, William D. Gropp, Dinesh Kaushik, Matthew G. Knepley, Lois Curfman McInnes, Karl Rupp, Barry F. Smith, Stefano Zampini, Hong Zhang, and Hong Zhang. 2016. PETSc Users Manual. Technical Report ANL-95/11 - Revision 3.7. Argonne National Laboratory. http://www.mcs.anl.gov/petsc.

10.1145/2049673.2049678

Peter Bastian , Christian Engwer , Jorrit Fahlke , Markus Geveler , Dominik Göddeke , Oleg Iliev , Olaf Ippisch , René Milk , Jan Mohring , Steffen Müthing , Mario Ohlberger , Dirk Ribbrock , and Stefan Turek . 2016. Hardware-based efficiency advances in the EXA-DUNE project . In Software for Exascale Computing -- SPPEXA 2013-2015, Hans-Joachim Bungartz, Philipp Neumann, and Wolfgang E . Nagel (Eds.). Springer , Cham , 3--23. Peter Bastian, Christian Engwer, Jorrit Fahlke, Markus Geveler, Dominik Göddeke, Oleg Iliev, Olaf Ippisch, René Milk, Jan Mohring, Steffen Müthing, Mario Ohlberger, Dirk Ribbrock, and Stefan Turek. 2016. Hardware-based efficiency advances in the EXA-DUNE project. In Software for Exascale Computing -- SPPEXA 2013-2015, Hans-Joachim Bungartz, Philipp Neumann, and Wolfgang E. Nagel (Eds.). Springer, Cham, 3--23.

Peter Bastian , Christian Engwer , Dominik Göddeke , Oleg Iliev , Olaf Ippisch , Mario Ohlberger , Stefan Turek , Jorrit Fahlke , Sven Kaulmann , Steffen Müthing , and Dirk Ribbrock . 2014. EXA-DUNE: Flexible PDE solvers, numerical methods and applications . In Euro-Par 2014: Parallel Processing Workshops . Lecture Notes in Computer Science , Vol. 8806 . Springer , 530--541. Peter Bastian, Christian Engwer, Dominik Göddeke, Oleg Iliev, Olaf Ippisch, Mario Ohlberger, Stefan Turek, Jorrit Fahlke, Sven Kaulmann, Steffen Müthing, and Dirk Ribbrock. 2014. EXA-DUNE: Flexible PDE solvers, numerical methods and applications. In Euro-Par 2014: Parallel Processing Workshops. Lecture Notes in Computer Science, Vol. 8806. Springer, 530--541.

10.1007/s10915-010-9396-8

10.1016/j.cpc.2015.02.008

10.1007/s10915-015-0049-9

Michel O. Deville , Paul F. Fischer , and Ernest H . Mund . 2002 . High-order Methods for Incompressible Fluid Flow. Vol. 9 . Cambridge University Press . Michel O. Deville, Paul F. Fischer, and Ernest H. Mund. 2002. High-order Methods for Incompressible Fluid Flow. Vol. 9. Cambridge University Press.

Jack Dongarra , Iain Duff , Mark Gates , Azzam Haidar , Sven Hammarling , Nicholas J. Higham , Jonathan Hogg , Pedro Valero Lara , Samuel D. Relton, Stanimire Tomov, and Mawussi Zounon. 2016 . A Proposed API for Batched Basic Linear Algebra Subprograms. Technical Report. University of Tennessee . https://bit.ly/batched-blas. Jack Dongarra, Iain Duff, Mark Gates, Azzam Haidar, Sven Hammarling, Nicholas J. Higham, Jonathan Hogg, Pedro Valero Lara, Samuel D. Relton, Stanimire Tomov, and Mawussi Zounon. 2016. A Proposed API for Batched Basic Linear Algebra Subprograms. Technical Report. University of Tennessee. https://bit.ly/batched-blas.

10.1002/fld.4511

10.1002/fld.4683

Paul Fischer Stefan Kerkemeier Adam Peplinski Dillon Shaver Ananias Tomboulides Misun Min Aleksandr Obabko and Elia Merzari. 2018. Nek5000 Web page. https://nek5000.mcs.anl.gov. Paul Fischer Stefan Kerkemeier Adam Peplinski Dillon Shaver Ananias Tomboulides Misun Min Aleksandr Obabko and Elia Merzari. 2018. Nek5000 Web page. https://nek5000.mcs.anl.gov.

Georg Hager and Gerhard Wellein . 2011. Introduction to High Performance Computing for Scientists and Engineers . CRC Press , Boca Raton . Georg Hager and Gerhard Wellein. 2011. Introduction to High Performance Computing for Scientists and Engineers. CRC Press, Boca Raton.

Alexander Heinecke , Greg Henry , and Hans Pabst . 2017 . LIBXSMM: A high performance library for small matrix multiplications. https://github.com/hfp/libxsmm. Alexander Heinecke, Greg Henry, and Hans Pabst. 2017. LIBXSMM: A high performance library for small matrix multiplications. https://github.com/hfp/libxsmm.

10.1145/1089014.1089021

Jan S. Hesthaven and Tim Warburton . 2008 . Nodal Discontinuous Galerkin Methods: Algorithms, Analysis , and Application. Texts in Applied Mathematics, Vol. 54 . Springer . Jan S. Hesthaven and Tim Warburton. 2008. Nodal Discontinuous Galerkin Methods: Algorithms, Analysis, and Application. Texts in Applied Mathematics, Vol. 54. Springer.

10.1016/j.compfluid.2012.03.006

10.1145/2807591.2807644

M. Homolya R. C. Kirby and D. A. Ham. 2017. Exposing and exploiting structure: Optimal code generation for high-order finite element methods. arXiv preprint 1711.02473 (2017) cs.MS. M. Homolya R. C. Kirby and D. A. Ham. 2017. Exposing and exploiting structure: Optimal code generation for high-order finite element methods. arXiv preprint 1711.02473 (2017) cs.MS.

10.1016/j.jcp.2017.06.012

Intel Corporation 2017. Intel 64 and IA-32 Architectures Optimization Reference Manual . Intel Corporation . Order no. 248966-037, https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf. Intel Corporation 2017. Intel 64 and IA-32 Architectures Optimization Reference Manual. Intel Corporation. Order no. 248966-037, https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf.

Jim Jeffers , James Reinders , and Avinash Sodani . 2016. Intel Xeon Phi Processor High Performance Programming , Knights Landing Edition. Morgan-Kaufmann , Cambridge , MA. Jim Jeffers, James Reinders, and Avinash Sodani. 2016. Intel Xeon Phi Processor High Performance Programming, Knights Landing Edition. Morgan-Kaufmann, Cambridge, MA.

George E. Karniadakis and Spencer J . Sherwin . 2005 . Spectral/hp Element Methods for Computational Fluid Dynamics (2nd ed.). Oxford University Press . George E. Karniadakis and Spencer J. Sherwin. 2005. Spectral/hp Element Methods for Computational Fluid Dynamics (2nd ed.). Oxford University Press.

Dominic Kempf , René Hess , Steffen Müthing , and Peter Bastian . 2018. Automatic code generation for high-performance discontinuous Galerkin methods on modern architectures. arXiv preprint 1812 .08075 (2018), math.NA. Dominic Kempf, René Hess, Steffen Müthing, and Peter Bastian. 2018. Automatic code generation for high-performance discontinuous Galerkin methods on modern architectures. arXiv preprint 1812.08075 (2018), math.NA.

10.1145/2627373.2627387

10.1016/j.jcp.2009.06.041

Matthew G. Knepley , Jed Brown , Karl Rupp , and Barry F . Smith . 2013 . Achieving high performance with unified residual evaluation. arXiv preprint 1309.1204 (2013), cs.MS. Matthew G. Knepley, Jed Brown, Karl Rupp, and Barry F. Smith. 2013. Achieving high performance with unified residual evaluation. arXiv preprint 1309.1204 (2013), cs.MS.

Dimitri Komatitsch Jean-Paul Ampuero Kangchen Bai Piero Basini Céline Blitz Ebru Bozdag Emanuele Casarotti Joseph Charles Min Chen Percy Galvez Dominik Göddeke Vala Hjörleifsdóttir Sue Kientz Jesús Labarta Nicolas Le Goff Pieyre Le Loher Matthieu Lefebvre Qinya Liu Yang Luo Alessia Maggi Federica Magnoni Roland Martin René Matzen Dennis McRitchie Matthias Meschede Peter Messmer David Michéa Surendra Nadh Somala Tarje Nissen-Meyer Daniel Peter Max Rietmann Elliott Sales de Andrade Brian Savage Bernhard Schuberth Anne Sieminski Leif Strand Carl Tape Jeroen Tromp Jean-Pierre Vilotte Zhinan Xie and Hejun Zhu. 2015. SPECFEM 3D Cartesian User Manual. Technical Report. Computational Infrastructure for Geodynamics Princeton University CNRS and University of Marseille and ETH Zürich. Dimitri Komatitsch Jean-Paul Ampuero Kangchen Bai Piero Basini Céline Blitz Ebru Bozdag Emanuele Casarotti Joseph Charles Min Chen Percy Galvez Dominik Göddeke Vala Hjörleifsdóttir Sue Kientz Jesús Labarta Nicolas Le Goff Pieyre Le Loher Matthieu Lefebvre Qinya Liu Yang Luo Alessia Maggi Federica Magnoni Roland Martin René Matzen Dennis McRitchie Matthias Meschede Peter Messmer David Michéa Surendra Nadh Somala Tarje Nissen-Meyer Daniel Peter Max Rietmann Elliott Sales de Andrade Brian Savage Bernhard Schuberth Anne Sieminski Leif Strand Carl Tape Jeroen Tromp Jean-Pierre Vilotte Zhinan Xie and Hejun Zhu. 2015. SPECFEM 3D Cartesian User Manual. Technical Report. Computational Infrastructure for Geodynamics Princeton University CNRS and University of Marseille and ETH Zürich.

David Kopriva . 2009. Implementing Spectral Methods for Partial Differential Equations . Springer , Berlin . David Kopriva. 2009. Implementing Spectral Methods for Partial Differential Equations. Springer, Berlin.

10.4208/cicp.101214.021015a

10.1109/eScience.2011.53

10.1016/j.jcp.2017.07.039

Martin Kronbichler and Momme Allalen. 2018. Efficient high-order discontinuous Galerkin finite elements with matrix-free implementations. In Advances and Trends in Environmental Informatics H.-J. Bungartz D. Kranzlmüller V. Weinberg J. Weismüller and V. Wohlgemuth (Eds.). 89--110. Martin Kronbichler and Momme Allalen. 2018. Efficient high-order discontinuous Galerkin finite elements with matrix-free implementations. In Advances and Trends in Environmental Informatics H.-J. Bungartz D. Kranzlmüller V. Weinberg J. Weismüller and V. Wohlgemuth (Eds.). 89--110.

10.5555/3195466.3195472

10.1016/j.compfluid.2012.04.012

10.1007/978-3-319-58667-0_13

10.1002/nme.5137

10.1137/16M110455X

10.1145/3054944

10.1109/SC.2014.28

10.1137/15M1021167

10.1016/j.cageo.2016.03.008

Steffen Müthing , Marian Piatkowski , and Peter Bastian . 2017. High-performance implementation of matrix-free high-order discontinuous Galerkin methods. arXiv preprint 1711.10885 ( 2017 ), math.NA. Steffen Müthing, Marian Piatkowski, and Peter Bastian. 2017. High-performance implementation of matrix-free high-order discontinuous Galerkin methods. arXiv preprint 1711.10885 (2017), math.NA.

10.1016/0021-9991(80)90005-4

10.1016/0021-9991(84)90128-1

10.1145/2998441

James Reinders. 2007. Intel Threading Building Blocks. O’Reilly. James Reinders. 2007. Intel Threading Building Blocks. O’Reilly.

10.1016/j.jcp.2016.08.005

Joachim Schöberl . 2014. C++11 Implementation of Finite Elements in NG Solve . Technical Report ASC Report No. 30/2014 . Vienna University of Technology . Joachim Schöberl. 2014. C++11 Implementation of Finite Elements in NGSolve. Technical Report ASC Report No. 30/2014. Vienna University of Technology.

10.1137/18M1185399

10.1006/jcph.1996.0042

Tianjiao Sun , Lawrence Mitchell , Kaushik Kulkarni , Andreas Klöckner , David A. Ham , and Paul H. J . Kelly . 2019 . A study of vectorization for matrix-free finite element methods. arXiv preprint 1903.08243 (2019), cs.MS. Tianjiao Sun, Lawrence Mitchell, Kaushik Kulkarni, Andreas Klöckner, David A. Ham, and Paul H. J. Kelly. 2019. A study of vectorization for matrix-free finite element methods. arXiv preprint 1903.08243 (2019), cs.MS.

10.1109/ICPPW.2010.38

10.1002/fld.3767

10.1145/1498765.1498785