Massive parallelization of multilevel fast multipole algorithm for 3-D electromagnetic scattering problems on SW26010 many-core cluster

Xin-Duo Liu1, Wei-Jia He1, Ming-Lin Yang1, Xin-Qing Sheng1
1Center for Electromagnetic Simulation, Beijing Institute of Technology, Beijing, China

Tóm tắt

This paper presents a massively parallel approach of the multilevel fast multipole algorithm (PMLFMA) on homegrown many-core SW26010 cluster of China, noted as (SW-PMLFMA), for 3-D electromagnetic scattering problems. In this approach, the multilevel fast multipole algorithm (MLFMA) octree is first partitioned among management processing elements (MPEs) of SW26010 processors following the ternary partitioning scheme using the message passing interface (MPI). Then, the computationally intensive parts of the PMLFMA on each MPI process, matrix filling, aggregation and disaggregation are accelerated by using all the 64 computing processing elements (CPEs) in the same core group of the MPE via the Athread parallel programming model. Different parallelization strategies are designed for many-core accelerators to ensure a high computational throughput. In coincidence with the special characteristic of local Lagrange interpolation, the compressed sparse row (CSR) and the compressed sparse column (CSC) sparse matrix storage format is used for storing interpolation and anterpolation matrices, respectively, together with a specially designed cache mechanism of hybrid dynamic and static buffers using the scratchpad memory (SPM) to improve data access efficiency. Numerical results are included to demonstrate the efficiency and versatility of the proposed method. The proposed parallel scheme is shown to have excellent speedup.

Tài liệu tham khảo

Coifman R, Rokhlin V, Wandzura S (1993) The fast multipole method for the wave equation: a pedestrian prescription. IEEE Antennas Propag Mag 35(3):7–12. https://doi.org/10.1109/74.250128 Rokhlin V (1993) Diagonal forms of translation operators for the Helmholtz equation in three dimensions. Appl Comput Harmon Anal 1(1):82–93. https://doi.org/10.1006/acha.1993.1006 Song J, Lu C-C, Chew WC (1997) Multilevel fast multipole algorithm for electromagnetic scattering by large complex objects. IEEE Trans Antennas Propag 45(10):1488–1493. https://doi.org/10.1109/8.633855 Velaparambil S, Chew WC, Song J (2003) 10 million unknowns: is it that big? IEEE Antennas Propag Mag 45(2):43–58 Gürel L, Ergül Ö (2007) Fast and accurate solutions of extremely large integral-equation problems discretised with tens of millions of unknowns. Electron Lett 43(9):499–500 Waltz C, Sertel K, Carr MA, Usner BC, Volakis JL (2007) Massively parallel fast multipole method solutions of large electromagnetic scattering problems. IEEE Trans Antennas Propag 55(6):1810–1816. https://doi.org/10.1109/TAP.2007.898511 Pan X-M, Sheng X-Q (2008) A sophisticated parallel mlfma for scattering by extremely large targets [em programmer’s notebook]. IEEE Antennas Propag Mag 50(3):129–138. https://doi.org/10.1109/MAP.2008.4563583 Ergul O, Gurel L (2009) A hierarchical partitioning strategy for an efficient parallelization of the multilevel fast multipole algorithm. IEEE Trans Antennas Propag 57(6):1740–1750. https://doi.org/10.1109/TAP.2009.2019913 Fostier J, Olyslager F (2008) An asynchronous parallel mlfma for scattering at multiple dielectric objects. IEEE Trans Antennas Propag 56(8):2346–2355. https://doi.org/10.1109/TAP.2008.926787 Taboada J, Araujo MG, Bertolo JM, Landesa L, Obelleiro F, Rodriguez JL (2010) Mlfma-fft parallel algorithm for the solution of large-scale problems in electromagnetics. Progr Electromagn Res 105(8):15–30. https://doi.org/10.2528/PIER10041603 Melapudi V, Shanker B, Seal S, Aluru S (2011) A scalable parallel wideband mlfma for efficient electromagnetic simulations on large scale clusters. IEEE Trans Antennas Propag 59(7):2565–2577. https://doi.org/10.1109/TAP.2011.2152311 Pan X-M, Pi W-C, Yang M-L, Peng Z, Sheng X-Q (2012) Solving problems with over one billion unknowns by the mlfma. IEEE Trans Antennas Propag 60(5):2571–2574. https://doi.org/10.1109/TAP.2012.2189746 Taboada JM, Araujo MG, Basteiro FO, Rodriguez JL, Landesa L (2013) Mlfma-fft parallel algorithm for the solution of extremely large problems in electromagnetics. Proc IEEE 101(2):350–363. https://doi.org/10.1109/JPROC.2012.2194269 Michiels B, Fostier J, Bogaert I, De Zutter D (2015) Full-wave simulations of electromagnetic scattering problems with billions of unknowns. IEEE Trans Antennas Propag 63(2):796–799. https://doi.org/10.1109/TAP.2014.2380438 Hughey S, Aktulga H, Vikram M, Lu M, Shanker B, Michielssen E (2019) Parallel wideband mlfma for analysis of electrically large, nonuniform, multiscale structures. IEEE Trans Antennas Propag 67(2):1094–1107. https://doi.org/10.1109/TAP.2018.2882621 MacKie-Mason B, Shao Y, Greenwood A, Peng Z (2018) Supercomputing-enabled first-principles analysis of radio wave propagation in urban environments. IEEE Trans Antennas Propag 66(12):6606–6617. https://doi.org/10.1109/TAP.2018.2874674 Yang M-L, Wu B-Y, Gao H-W, Sheng X-Q (2019) A ternary parallelization approach of mlfma for solving electromagnetic scattering problems with over 10 billion unknowns. IEEE Trans Antennas Propag 67(11):6965–6978. https://doi.org/10.1109/TAP.2019.2927660 Liu R-Q, Huang X-W, Du Y-L, Yang M-L, Sheng X-Q (2021) Massively parallel discontinuous galerkin surface integral equation method for solving large-scale electromagnetic scattering problems. IEEE Trans Antennas Propag 69(9):6122–6127. https://doi.org/10.1109/TAP.2021.3078558 Kong W-B, Zhou H-X, Zheng K-L, Hong W (2015) Analysis of multiscale problems using the mlfma with the assistance of the fft-based method. IEEE Trans Antennas Propag 63:4184–4188. https://doi.org/10.1109/TAP.2015.2444442 Fu H, Liao J, Yang J, Wang L, Song Z, Huang X, Yang C, Xue W, Liu F, Qiao F et al (2016) The sunway taihulight supercomputer: system and applications. Sci China Inf Sci. https://doi.org/10.1007/s11432-016-5588-7 Dongarra J (2016) Sunway TaihuLight supercomputer makes its appearance. Natl Sci Rev 3(3):265–266. https://doi.org/10.1093/nsr/nww044 Cwikla M, Aronsson J, Okhmatovski V (2010) Low-frequency mlfma on graphics processors. IEEE Antennas Wirel Propag Lett 9:8–11. https://doi.org/10.1109/LAWP.2010.2040571 Guan J, Yan S, Jin J-M (2013) An openmp-cuda implementation of multilevel fast multipole algorithm for electromagnetic simulation on multi-gpu computing systems. IEEE Trans Antennas Propag 61(7):3607–3616. https://doi.org/10.1109/TAP.2013.2258882 Phan T, Tran N, Kilic O (2021) Multi-level fast multipole algorithm for 3-d homogeneous dielectric objects using mpi-cuda on gpu cluster. Appl Comput Electromagn Soc J (ACES) 33(03):335–338 Tran N, Kilic O (2021) Parallel implementations of multilevel fast multipole algorithm on graphical processing unit cluster for large-scale electromagnetics objects. Appl Comput Electromagn Soc J (ACES) 33(02):180–183 Mu X, Zhou H-X, Chen K, Hong W (2014) Higher order method of moments with a parallel out-of-core lu solver on gpu/cpu platform. IEEE Trans Antennas Propag 62(11):5634–5646. https://doi.org/10.1109/TAP.2014.2350536 He W-J, Yang Z, Huang X-W, Wang W, Yang M-L, Sheng X-Q (2022) Solving electromagnetic scattering problems with tens of billions of unknowns using gpu accelerated massively parallel mlfma. IEEE Trans Antennas Propag 70(7):5672–5682. https://doi.org/10.1109/TAP.2022.3161520 He W-J, Yang M-L, Wang W, Sheng X-Q (2021) Efficient parallelization of multilevel fast multipole algorithm for electromagnetic simulation on many-core sw26010 processor. J Supercomput 77:1502–1516. https://doi.org/10.1007/s11227-020-03308-9