Massive parallelization of multilevel fast multipole algorithm for 3-D electromagnetic scattering problems on SW26010 many-core cluster
Springer Science and Business Media LLC - Trang 1-17 - 2023
Tóm tắt
This paper presents a massively parallel approach of the multilevel fast multipole algorithm (PMLFMA) on homegrown many-core SW26010 cluster of China, noted as (SW-PMLFMA), for 3-D electromagnetic scattering problems. In this approach, the multilevel fast multipole algorithm (MLFMA) octree is first partitioned among management processing elements (MPEs) of SW26010 processors following the ternary partitioning scheme using the message passing interface (MPI). Then, the computationally intensive parts of the PMLFMA on each MPI process, matrix filling, aggregation and disaggregation are accelerated by using all the 64 computing processing elements (CPEs) in the same core group of the MPE via the Athread parallel programming model. Different parallelization strategies are designed for many-core accelerators to ensure a high computational throughput. In coincidence with the special characteristic of local Lagrange interpolation, the compressed sparse row (CSR) and the compressed sparse column (CSC) sparse matrix storage format is used for storing interpolation and anterpolation matrices, respectively, together with a specially designed cache mechanism of hybrid dynamic and static buffers using the scratchpad memory (SPM) to improve data access efficiency. Numerical results are included to demonstrate the efficiency and versatility of the proposed method. The proposed parallel scheme is shown to have excellent speedup.
Tài liệu tham khảo
Coifman R, Rokhlin V, Wandzura S (1993) The fast multipole method for the wave equation: a pedestrian prescription. IEEE Antennas Propag Mag 35(3):7–12. https://doi.org/10.1109/74.250128
Rokhlin V (1993) Diagonal forms of translation operators for the Helmholtz equation in three dimensions. Appl Comput Harmon Anal 1(1):82–93. https://doi.org/10.1006/acha.1993.1006
Song J, Lu C-C, Chew WC (1997) Multilevel fast multipole algorithm for electromagnetic scattering by large complex objects. IEEE Trans Antennas Propag 45(10):1488–1493. https://doi.org/10.1109/8.633855
Velaparambil S, Chew WC, Song J (2003) 10 million unknowns: is it that big? IEEE Antennas Propag Mag 45(2):43–58
Gürel L, Ergül Ö (2007) Fast and accurate solutions of extremely large integral-equation problems discretised with tens of millions of unknowns. Electron Lett 43(9):499–500
Waltz C, Sertel K, Carr MA, Usner BC, Volakis JL (2007) Massively parallel fast multipole method solutions of large electromagnetic scattering problems. IEEE Trans Antennas Propag 55(6):1810–1816. https://doi.org/10.1109/TAP.2007.898511
Pan X-M, Sheng X-Q (2008) A sophisticated parallel mlfma for scattering by extremely large targets [em programmer’s notebook]. IEEE Antennas Propag Mag 50(3):129–138. https://doi.org/10.1109/MAP.2008.4563583
Ergul O, Gurel L (2009) A hierarchical partitioning strategy for an efficient parallelization of the multilevel fast multipole algorithm. IEEE Trans Antennas Propag 57(6):1740–1750. https://doi.org/10.1109/TAP.2009.2019913
Fostier J, Olyslager F (2008) An asynchronous parallel mlfma for scattering at multiple dielectric objects. IEEE Trans Antennas Propag 56(8):2346–2355. https://doi.org/10.1109/TAP.2008.926787
Taboada J, Araujo MG, Bertolo JM, Landesa L, Obelleiro F, Rodriguez JL (2010) Mlfma-fft parallel algorithm for the solution of large-scale problems in electromagnetics. Progr Electromagn Res 105(8):15–30. https://doi.org/10.2528/PIER10041603
Melapudi V, Shanker B, Seal S, Aluru S (2011) A scalable parallel wideband mlfma for efficient electromagnetic simulations on large scale clusters. IEEE Trans Antennas Propag 59(7):2565–2577. https://doi.org/10.1109/TAP.2011.2152311
Pan X-M, Pi W-C, Yang M-L, Peng Z, Sheng X-Q (2012) Solving problems with over one billion unknowns by the mlfma. IEEE Trans Antennas Propag 60(5):2571–2574. https://doi.org/10.1109/TAP.2012.2189746
Taboada JM, Araujo MG, Basteiro FO, Rodriguez JL, Landesa L (2013) Mlfma-fft parallel algorithm for the solution of extremely large problems in electromagnetics. Proc IEEE 101(2):350–363. https://doi.org/10.1109/JPROC.2012.2194269
Michiels B, Fostier J, Bogaert I, De Zutter D (2015) Full-wave simulations of electromagnetic scattering problems with billions of unknowns. IEEE Trans Antennas Propag 63(2):796–799. https://doi.org/10.1109/TAP.2014.2380438
Hughey S, Aktulga H, Vikram M, Lu M, Shanker B, Michielssen E (2019) Parallel wideband mlfma for analysis of electrically large, nonuniform, multiscale structures. IEEE Trans Antennas Propag 67(2):1094–1107. https://doi.org/10.1109/TAP.2018.2882621
MacKie-Mason B, Shao Y, Greenwood A, Peng Z (2018) Supercomputing-enabled first-principles analysis of radio wave propagation in urban environments. IEEE Trans Antennas Propag 66(12):6606–6617. https://doi.org/10.1109/TAP.2018.2874674
Yang M-L, Wu B-Y, Gao H-W, Sheng X-Q (2019) A ternary parallelization approach of mlfma for solving electromagnetic scattering problems with over 10 billion unknowns. IEEE Trans Antennas Propag 67(11):6965–6978. https://doi.org/10.1109/TAP.2019.2927660
Liu R-Q, Huang X-W, Du Y-L, Yang M-L, Sheng X-Q (2021) Massively parallel discontinuous galerkin surface integral equation method for solving large-scale electromagnetic scattering problems. IEEE Trans Antennas Propag 69(9):6122–6127. https://doi.org/10.1109/TAP.2021.3078558
Kong W-B, Zhou H-X, Zheng K-L, Hong W (2015) Analysis of multiscale problems using the mlfma with the assistance of the fft-based method. IEEE Trans Antennas Propag 63:4184–4188. https://doi.org/10.1109/TAP.2015.2444442
Fu H, Liao J, Yang J, Wang L, Song Z, Huang X, Yang C, Xue W, Liu F, Qiao F et al (2016) The sunway taihulight supercomputer: system and applications. Sci China Inf Sci. https://doi.org/10.1007/s11432-016-5588-7
Dongarra J (2016) Sunway TaihuLight supercomputer makes its appearance. Natl Sci Rev 3(3):265–266. https://doi.org/10.1093/nsr/nww044
Cwikla M, Aronsson J, Okhmatovski V (2010) Low-frequency mlfma on graphics processors. IEEE Antennas Wirel Propag Lett 9:8–11. https://doi.org/10.1109/LAWP.2010.2040571
Guan J, Yan S, Jin J-M (2013) An openmp-cuda implementation of multilevel fast multipole algorithm for electromagnetic simulation on multi-gpu computing systems. IEEE Trans Antennas Propag 61(7):3607–3616. https://doi.org/10.1109/TAP.2013.2258882
Phan T, Tran N, Kilic O (2021) Multi-level fast multipole algorithm for 3-d homogeneous dielectric objects using mpi-cuda on gpu cluster. Appl Comput Electromagn Soc J (ACES) 33(03):335–338
Tran N, Kilic O (2021) Parallel implementations of multilevel fast multipole algorithm on graphical processing unit cluster for large-scale electromagnetics objects. Appl Comput Electromagn Soc J (ACES) 33(02):180–183
Mu X, Zhou H-X, Chen K, Hong W (2014) Higher order method of moments with a parallel out-of-core lu solver on gpu/cpu platform. IEEE Trans Antennas Propag 62(11):5634–5646. https://doi.org/10.1109/TAP.2014.2350536
He W-J, Yang Z, Huang X-W, Wang W, Yang M-L, Sheng X-Q (2022) Solving electromagnetic scattering problems with tens of billions of unknowns using gpu accelerated massively parallel mlfma. IEEE Trans Antennas Propag 70(7):5672–5682. https://doi.org/10.1109/TAP.2022.3161520
He W-J, Yang M-L, Wang W, Sheng X-Q (2021) Efficient parallelization of multilevel fast multipole algorithm for electromagnetic simulation on many-core sw26010 processor. J Supercomput 77:1502–1516. https://doi.org/10.1007/s11227-020-03308-9