Fine-grained GPU implementation of assembly-free iterative solver for finite element problems

Computers & Structures - Tập 157 - Trang 9-18 - 2015
Jesús Martínez-Frutos1, Pedro J. Martínez-Castejón1, David Herrero-Pérez1
1Department of Structures and Construction, Technical University of Cartagena, Campus Muralla del Mar, 30202 Cartagena (Murcia), Spain

Tài liệu tham khảo

Pratx, 2011, GPU computing in medical physics: a review, Med Phys, 38, 2685, 10.1118/1.3578605 Brodtkorb, 2013, Graphics processing unit (GPU) programming strategies and trends in GPU computing, J Parallel Distrib Comput, 73, 4, 10.1016/j.jpdc.2012.04.003 Wulf, 1995, Hitting the memory wall: implications of the obvious, ACM SIGARCH Comput Architect News, 23, 20, 10.1145/216585.216588 Asanovic K, Bodik R, Catanzaro BC, Gebis JJ, Keutzer K, Patterson DA, et al. The landscape of parallel computing research: a view from Berkeley. Technical Report, UC Berkeley; 2006. Asanovic, 2009, A view of the parallel computing landscape, Commun ACM, 52, 56, 10.1145/1562764.1562783 Uk B, Taufer M, Stricker T, Settanni G, Cavalli A, Caflisch A. Combining task- and data parallelism to speed up protein folding on a desktop grid platform. In: Proc of IEEE/ACM conf of cluster computing and the grid. Tokyo, Japan; 2003. p. 240–7. Kindratenko, 2011, Trends in high-performance computing, Comput Sci Eng, 13, 92, 10.1109/MCSE.2011.52 Habich, 2013, Performance engineering for the lattice Boltzmann method on GPGPUs: architectural requirements and performance results, Comput Fluids, 80, 276, 10.1016/j.compfluid.2012.02.013 Michéa, 2010, Accelerating a three-dimensional finite-difference wave propagation code using GPU graphics cards, Geophys J Int, 182, 389 Corrigan, 2011, Running unstructured grid-based CFD solvers on modern graphics hardware, Int J Numer Methods Fluids, 66, 221, 10.1002/fld.2254 Yokota, 2012, Hierarchical N-body simulations with autotuning for heterogeneous systems, Comput Sci Eng, 14, 30, 10.1109/MCSE.2012.1 Schaller, 1997, Moore’s law: past, present and future, IEEE Spectr, 34, 53, 10.1109/6.591665 Venkataraman, 2004, Structural optimization complexity: what has Moore’s law done for us?, Struct Multidiscip Optim, 28, 375, 10.1007/s00158-004-0415-y Bathe, 2007 Gullerud, 2001, MPI-based implementation of a PCG solver using an EBE architecture and preconditioner for implicit, 3-D finite element analysis, Comput Struct, 79, 553, 10.1016/S0045-7949(00)00153-X Mackie, 2008, Object-oriented programming of distributed iterative equation solvers, Comput Struct, 86, 511, 10.1016/j.compstruc.2007.05.003 Carey, 1986, Element-by-element linear and nonlinear solution schemes, Commun Appl Numer Methods, 2, 145, 10.1002/cnm.1630020205 Göddeke, 2007, Exploring weak scalability for FEM calculations on a GPU-enhanced cluster, Parallel Comput, 33, 685, 10.1016/j.parco.2007.09.002 Dziekonski, 2012, Finite element matrix generation on a GPU, Prog Electromagnetics Res, 128, 249, 10.2528/PIER12040301 Fu, 2014, Architecting the finite element method pipeline for the GPU, J Comput Appl Math, 257, 195, 10.1016/j.cam.2013.09.001 Georgescu, 2013, GPU acceleration for FEM-based structural analysis, Arch Comput Method Eng, 20, 111, 10.1007/s11831-013-9082-8 Ament M, Knittel G, Weiskopf D, Strasser W. A parallel preconditioned conjugate gradient solver for the poisson problem on a multi-GPU platform. In: Proc of IEEE conf on parallel, distributed and network-based processing. Pisa, Italy; 2010. p. 583–92. Dehnavi, 2011, Enhancing the performance of conjugate gradient solvers on graphic processing units, IEEE Trans Magn, 47, 1162, 10.1109/TMAG.2010.2081662 Knibbe, 2011, GPU implementation of a Helmholtz Krylov solver preconditioned by a shifted Laplace multigrid method, J Comput Appl Math, 236, 281, 10.1016/j.cam.2011.07.021 Helfenstein, 2012, Parallel preconditioned conjugate gradient algorithm on GPU, J Comput Appl Math, 236, 3584, 10.1016/j.cam.2011.04.025 Galiano, 2012, GPU-based parallel algorithms for sparse nonlinear systems, J Parallel Distrib Comput, 72, 1098, 10.1016/j.jpdc.2011.10.016 Li, 2013, GPU-accelerated preconditioned iterative linear solvers, J Supercomput, 63, 443, 10.1007/s11227-012-0825-3 Huthwaite, 2014, Accelerated finite element elastodynamic simulations using the GPU, J Comput Phys, 257, 687, 10.1016/j.jcp.2013.10.017 Kiss I, Badics Z, Gyimóthy S, Pávó J. High locality and increased intra-node parallelism for solving finite element models on GPUs by novel element-by-element implementation. In: Proc of IEEE conf on high performance extreme computing (HPEC). Waltham, MA, USA; 2012. p. 1–5. Kiss, 2012, Parallel realization of the element-by-element FEM technique by CUDA, IEEE Trans Magn, 48, 507, 10.1109/TMAG.2011.2175905 Cai, 2013, A parallel node-based solution scheme for implicit finite element method using GPU, Procedia Eng, 61, 318, 10.1016/j.proeng.2013.08.022 Suresh, 2013, Efficient generation of large-scale pareto-optimal topologies, Struct Multidiscip Optim, 47, 49, 10.1007/s00158-012-0807-3 Zegard, 2013, Toward GPU accelerated topology optimization on unstructured meshes, Struct Multidiscip Optim, 48, 473, 10.1007/s00158-013-0920-y Arbenz, 2008, A scalable multi-level preconditioner for matrix-free μ-finite element analysis of human bone structures, Int J Numer Methods Eng, 73, 927, 10.1002/nme.2101 Liu, 2007, A distributed memory parallel element-by-element scheme based on Jacobi-conditioned conjugate gradient for 3D finite element analysis, Finite Elements Anal Des, 43, 494, 10.1016/j.finel.2006.12.007 Van Rietbergen, 1996, Computational strategies for iterative solutions of large FEM applications employing voxel data, J Numer Methods Engrgy, 39, 2743, 10.1002/(SICI)1097-0207(19960830)39:16<2743::AID-NME974>3.0.CO;2-A Müller E, Guo X, Scheichl R, Shi S. Matrix-free GPU implementation of a preconditioned conjugate gradient solver for anisotropic elliptic PDEs. Tech Rep 1302.7193v1; Cornell University; 2013.