Performance improvements of differential operators code for MPS method on GPU
Tóm tắt
In the present study, performance improvements of the particle search and particle interaction calculation steps constituting the performance bottleneck in the moving particle simulation method are achieved by developing GPU-compatible algorithms for many core processor architectures. In the improvements of particle search, bucket loops of the cell-linked list are changed to a loop structure having fewer local variables and the linked list and the forward star of particle search algorithms within a bucket are compared. In the particle interaction calculation, the problem of the ratio of particles within the interaction domain to the neighboring particle candidates being quite low is improved. By these improvements, a performance efficiency of 24.7 % can be achieved for the first-order polynomial approximation scheme using NVIDIA Tesla K20, CUDA-6.5, and double-precision floating-point operations.
Tài liệu tham khảo
Koshizuka S, Oka Y (1996) Moving-particle semi-implicit method for fragmentation of incompressible fluid. Nucl Sci Eng (NSE) 123:421–434
Koshizuka S, Nobe A, Oka Y (1998) Numerical analysis of breaking waves using the moving particle semi-implicit method. Int J Numer Methods Fluids 26:751–769
Lucy LB (1977) A numerical approach to the testing of the fission hypothesis. Astron J (AJ) 82:1013–1024
Gingold RA, Monaghan JJ (1977) Smoothed particle hydrodynamics: theory and application to non-spherical stars. Mon Not R Astron Soc (MNRAS) 181:375–389
Murotani K, Koshizuka S, Tamai T, Shibata K, Mitsume N, Yoshimura S, Tanaka S, Hasegawa K, Nagai E, Fujisawa T (2014) Development of hierarchical domain decomposition explicit MPS method and application to large-scale tsunami analysis with floating objects. J Adv Simul Sci Eng (JASSE) 1(1):16–35
Iribe T, Fujisawa T, Koshizuka S (2010) Reduction of communication in parallel computing of particle method for flow simulation of seaside areas. Coast Eng J 52(4):287–304
Marrone S, Bouscasse B, Colagrossi A (2012) Numerical modeling of ship wave patterns through a hybrid OpenMP/MPI SPH solver. In: 2nd international conference on violent flows, Nantes (France), 221–228
Leffe MD, Guilcher PM, Candelier J, Le TD, Oger G, Grenier N (2012) SPH for naval applications. In: 2nd international conference on violent flows, Nantes (France), 229–237
Kipfer P, Segal M, Westermann R (2004) UberFlow: a GPU-based particle engine. In: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on graphics hardware (HWWS ’04), 115–122
Kolb A, Cuntz N (2005) Dynamic particle coupling for GPU-based fluid simulation. In Proceedings of the 18th symposium on simulation technique, pp 722–727
Green S (2007) Particle simulation using CUDA. NVIDIA Documentation. http://docs.nvidia.com/cuda/samples/5_Simulations/particles/doc/particles.pdf. Accessed
Wilt N (2012) CUDA handbook: N-body. Addison-Wesley, New York, pp 421–447
Nyland L, Harris M, Prins J (2007) Fast N-body simulation with CUDA. GPU gems 3, Chapter 31. Addison-Wesley, New York
Chen FG, Ge W, Li JH (2009) Molecular dynamics simulation of complex multiphase flow on a computer cluster with GPUs. Sci China Ser B 52(3):372–380
Chen MJ, Xiao GB, Chen JX, Chun YW (2010) Research on the influence of machining introduced sub-surface defects and residue stress upon the mechanical properties of single crystal copper. Sci China Technol Sci 53(12):3161–3167
Joshua AA, Chris DL, Alex T (2008) General purpose molecular dynamics simulations fully implemented on graphics processing units. J Comput Phys 227:5342–5359
Harada T, Koshizuka S, Kawaguchi Y (2007) Real-time fluid simulation coupled with cloth. Theory and practice of, computer graphics, pp 13–20
Harada T (2007) Real-time rigid body simulation on GPUs. GPU gems 3. Addison-Wesley, New York
Shigeto Y, Sakai M (2011) Parallel computing of discrete element method on multi-core processors. Particuology 9(4):398–405
Charles AR, Benjamin JG, Johannes GK (2010) Large-scale powder mixer simulations using massively parallel GPU architectures. Chem Eng Sci 65(24):6435–6442
Nishiura D, Sakaguchi H (2011) Parallel-vector algorithms for particle simulations on shared-memory multiprocessors. J Comput Phys 230(5–1):1923–1938
Zhu XS, Cheng L, Lu L, Teng B (2011) Implementation of the moving particle semi-implicit method on GPU. Sci China Phys Mech Astron 54(3):523–532
Hori C, Gotoh H, Ikari H, Khayyer A (2011) GPU-acceleration for moving particle semi-implicit method. Comput Fluids 51(1):174–183
Khayyer A, Gotoh H (2013) Enhancement of performance and stability of MPS mesh-free particle method for multiphase flows characterized by high density ratios. J Comput Phys 242(1):211–233
Harada T, Koshizuka S, Kawaguchi Y (2007) Smoothed particle hydrodynamics on GPUs. In; Proceedings of computer graphics international, pp 63–70
Harada T, Koshizuka S, Kawaguchi Y (2007) Slided data structure for particle-based simulations on GPUs. In: Proceedings of the 5th international conference on computer graphics and interactive techniques in Australia and Southeast Asia (GRAPHITE ’07), Perth, Western Australia, pp 55–62
Alejandro JCC, Jose MD, Anxo B, Moncho GG, Benedict DR (2011) GPUs, a new tool of acceleration in CFD: efficiency and reliability on smoothed particle hydrodynamics methods. PloS one Public Libr Sci 6(6):e20685
Daniel VB, Jose MD, Benedict DR, Alejandro JCC (2013) Towards accelerating smoothed particle hydrodynamics simulations for free-surface flows on multi-GPU clusters. J Parallel Distrib Comput 73(11):1483–1493
Jose MD, Alejandro JCC, Moncho GG (2013) Optimization strategies for CPU and GPU implementations of a smoothed particle hydrodynamics method. Comput Phys Commun 184(3):617–627
Ericson C (2004) Real-time collision detection (The Morgan Kaufmann Series in Interactive 3-D Technology). CRC Press, Boca Raton
Heinz TH, Hunenberger PH (2004) A fast pairlist-construction algorithm for molecular simulations under periodic boundary conditions. J Comput Chem 25(12):1474–1486
Mattson W, Rice BM (1999) Near-neighbor calculations using a modified cell-linked list method. Comput Phys Commun 119(2–3):135–148
Gonnet P (2007) A simple algorithm to accelerate the computation of non-bonded interactions in cell-based molecular dynamics simulations. J Comput Chem 28(2):570–573
Yao Z, Wang JS, Cheng M (2004) Improved O(n) neighbor list method using domain decomposition and data sorting. High Perform Comput Eng Syst (HPCES) 161(1–2):27–35
Satish N, Harris M, Garland M, (2009) Designing efficient sorting algorithms for manycore GPUs. In: Proceedings of the 2009 IEEE international symposium on parallel & distributed processing, pp 1–10
Ha L, Kruger J, Silva TC (2009) Fast four-way parallel radix sorting on GPUs. Comput Graph Forum 28(8):2368–2378
Merrill GD, Grimshaw SA (2010) Revisiting sorting for GPGPU stream architectures. In: Proceedings of the 19th international conference on parallel architectures and compilation techniques (PACT ’10), Edmonton, Canada, pp 545–546
Hou Q, Zhou K, Guo B (2008) BSGP: bulk-synchronous GPU programming. In: ACM transactions on graphics (TOG), proceedings of ACM SIGGRAPH 2008 TOG Homepage, 27(3):Article No.19
Le GS (2007) Broad-phase collision detection with CUDA: GPU gems 3. Addison-Wesley, Boston
Tamai T, Koshizuka S (2014) Least squares moving particle semi-implicit method. Comput Part Mech 1(3):277–305
Dilts GA (1999) Moving-least-squares-particle hydrodynamics: I. Consistency and stability. Int J Numer Methods Eng 44:1115–1155
Liu WK, Jun S, Zhang YF (2005) Reproducing kernel particle methods. Int J Numer Methods Fluids 20:1081–1106
Ahuja RK, Magnanti TL, Orlin JB (1993) Network flows: theory, algorithms and applications. Prentice Hall, Englewood Cliffs
Ahuja RK, Mehlhorn K, Orlin JB, Tarjan RE (1990) Faster algorithms for the shortest path problem. J Assoc Comput Mach 37:213–223
Cherkassky BV, Goldberg AV, Radzik T (1993) Shortest paths algorithms: theory and experimental evaluation. Technical Report 93(1480), Computer Science Department, Stanford University
Fredman ML, Tarjan RE (1987) Fibonacci heaps and their uses in improved network optimization algorithms. J ACM 34:596–615
Gallo G, Pallottino S (1988) Shortest paths algorithms. Ann Oper Res 13:3–79
Hung MH, Divoky JJ (1988) A computational study of efficient shortest path algorithms. Comput Oper Res 15:567–576
Mondou JF, Crainic TG, Nguyen S (1991) Shortest path algorithms: a computational study with the C programming language. Comput Oper Res 18:767–786
Pallottino S (1984) Shortest-path methods: complexity, interrelations and new propositions. Networks 14:257–267
Zhan FB (1997) Three fastest shortest path algorithms on real road networks: data structures and procedures. J Geogr Inf Decis Anal 1(1):69–82
Zhan FB, Noon CE (1996) Shortest path algorithms: an evaluation using real road networks. Transp Sci 32(1):65–73