PARCSIM: a parallel computing simulator for scalable software optimization
Tóm tắt
PARCSIM is a parallel software simulator that allows a user to capture, through a graphical interface, matrix algorithm schemes that solve scientific problems. With this tool, the user can analyse the execution times that would be obtained by using different spatio-temporal mapping of computational tasks on available computational units, parallelism parameters and computational libraries. Furthermore, for complex problem models, the self-optimization engine incorporated in this tool analyses the huge tree of possible calculations grouping and mapping strategies in search of the choice that makes the best use of the available hardware resources. This tool also offers polyalgorithmic resolution by making automatically the best decision between different software approaches to solve a given problem on the hardware system available. This work shows the usefulness of this simulator to efficiently solve hierarchical problems constructed from previously modelled subproblems. This task is performed by reusing, in a scalable way, the optimization information of these subproblems to establish the best execution configuration for the composite problem.
Tài liệu tham khảo
Cámara J, Cuenca J, García L, Giménez D (2014) Auto-tuned nested parallelism: a way to reduce the execution time of scientific software in NUMA systems. Parallel Comput 40(7):309–327
Cuenca J, García L, Giménez D, Herrera F (2017) Guided installation of basic linear algebra routines in a cluster with manycore components. Concurr Comput Pract Exp. https://doi.org/10.1002/cpe.4112
Cano J-C, Cuenca J, Giménez D, Saura-Sánchez M, Segado-Cabezos P (2018) A parallel simulator for multibody systems based on group equations. J Supercomput 75:1368–1381
Saura M, Celdrán AI, Dopico D, Cuadrado J (2014) Computational structural analysis of planar multibody systems with lower and higher kinematic pairs. Mech Mach Theory 71:79–92
Saura M, Segado P, Muñoz B, Dopico D (2015) Multibody kinematics. A topological formulation based on structural-group coordinates. In: ECCOMAS Thematic Conference on Multibody Dynamics, pp 88–99
Anderson E, Bai Z, Bischof C, Demmel J, Dongarra JJ, Croz JD, Grenbaum A, Hammarling S, McKenney A, Ostrouchov S, Sorensen D (1995) LAPACK user’s guide. Society for Industrial and Applied Mathematics, Philadelphia
Intel MKL PARDISO (2018) https://software.intel.com/en-us/node/470282
Dagum L, Menon R (1998) OpenMP: an industry standard API for shared-memory programming. Comput Sci Eng IEEE 5(1):46–55
Nickolls J, Buck I, Garland M, Skadron K (2008) Scalable parallel programming with CUDA. Queue 6(2):40–53
Cámara J, Cuenca J, Giménez D (2020) Integrating software and hardware hierarchies in an autotuning method for parallel routines in heterogeneous clusters. J Supercomput 76:9922–9941
Batory D (1992) The design and implementation of hierarchical software systems with reusable components. ACM Trans Softw Eng Methodol 1:355–398
Blackford LS, Choi J, Cleary A, D’cAzevedo E, Demmel J, Dhillon I, Dongarra JJ, Hammarling S, Henry G, Petitet A, Stanley K, Walker D, Whaley RC (1997) ScaLAPACK user’s guide. Society for Industrial and Applied Mathematics, Philadelphia
Dackland K, Kågström B (1996) A hierarchical approach for performance analysis of ScaLAPACK-based routines using the distributed linear algebra machine. In: Applied parallel computing, industrial computation and optimization, third international workshop, PARA96, Lyngby, Denmark, pp 186–195
Intel Corporation: Intel Math Kernel Library. https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html. Accedido 19 Apr, 2020
Intel Corporation: Intel MKL PARDISO-Parallel Direct Sparse Solver Interface. https://software.intel.com/en-us/node/470282. Accedido 16 May, 2020
Computational Mathematics Group at the STFC Rutherford Appleton Laboratory: HSL (2013) A collection of Fortran codes for large scale scientific computation. http://www.hsl.rl.ac.uk/. Accedido 16 May, 2020
Tomov S, Dongarra J, Baboulin M (2010) Towards dense linear algebra for hybrid GPU accelerated manycore systems. Parallel Comput 36:232–240