DAGuE: A generic distributed DAG engine for High Performance Computing

Parallel Computing - Tập 38 - Trang 37-51 - 2012
George Bosilca1, Aurelien Bouteiller1, Anthony Danalis1, Thomas Herault1, Pierre Lemarinier2, Jack Dongarra1,3
1Innovative Computing Laboratory, The University of Tennessee, United States
2IRISA, Université de Rennes 1, France
3Oak Ridge National Laboratory, United States

Tài liệu tham khảo

Bernstein, 1966, Analysis of programs for parallel processing, IEEE Transactions on Electronic Computers, EC-15, 757, 10.1109/PGEC.1966.264565 E.G. Coffman, Jr., P.J. Denning, Operating Systems Theory, Prentice Hall Professional Technical Reference, 1973. 1992 J. Yu, R. Buyya, A taxonomy of workflow management systems for grid computing, Tech. rep., Journal of Grid Computing, 2005. O. Delannoy, N. Emad, S. Petiton, Workflow global computing with YML, in: 7th IEEE/ACM International Conference on Grid Computing, 2006. Buttari, 2006, The impact of multicore on math software, vol. 4699, 1 Chan, 2008, Supermatrix: a multithreaded runtime scheduling system for algorithms-by-blocks, 123 E. Agullo, J. Demmel, J. Dongarra, B. Hadri, J. Kurzak, J. Langou, H. Ltaief, P. Luszczek, S. Tomov, Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, Journal of Physics: Conference Series 180. R. Dolbeau, S. Bihan, F. Bodin, HMPP: A hybrid multi-core parallel programming environment, in: Workshop on General Purpose Processing on Graphics Processing Units (GPGPU 2007), 2007. Augonnet, 2011, StarPU: a unified platform for task scheduling on heterogeneous multicore architectures, Concurrency and Computation: Practice and Experience, 23, 187, 10.1002/cpe.1631 J. Perez, R. Badia, J. Labarta, A dependency-aware task-based programming environment for multi-core architectures, in: IEEE International Conference on Cluster Computing, 2008, pp. 142–151. Song, 2009, Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems, 1 C. Augonnet, S. Thibault, R. Namyst, P.-A. Wacrenier, StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures, in: Euro-Par 2009 Euro-par’09 Proceedings, LNCS, Delft Pays-Bas, 2009. Cosnard, 2001, Automatic parallelization techniques based on compact DAG extraction and symbolic scheduling, Parallel Processing Letters, 11, 151, 10.1142/S012962640100049X Cosnard, 2004, Compact DAG representation and its symbolic scheduling, Journal of Parallel and Distributed Computing, 64, 921, 10.1016/j.jpdc.2004.05.001 E. Jeannot, Automatic multithreaded parallel program generation for message passing multiprocessors using parameterized task graphs, in: International Conference ‘Parallel Computing 2001’ (ParCo2001), 2001. Husbands, 2007, Multi-threading and one-sided communication in parallel lu factorization Gustavson, 2009, Distributed SBP cholesky factorization algorithms with near-optimal scheduling, ACM Transactions on Mathematical Software, 36, 1, 10.1145/1499096.1499100 W. Pugh, The omega test: a fast and practical integer programming algorithm for dependence analysis, in: Supercomputing ’91: Proceedings of the 1991 ACM/IEEE Conference on Supercomputing, New York, NY, USA, 1991, pp. 4–13. U.A. Acar, G.E. Blelloch, R.D. Blumofe, The data locality of work stealing., in: SPAA’00, 2000, pp. 1–12. F. Broquedis, J. Clet Ortega, S. Moreaud, N. Furmento, B. Goglin, G. Mercier, S. Thibault, R. Namyst, hwloc: a Generic Framework for Managing Hardware Affinities in HPC Applications, in: IEEE (Ed.), PDP 2010 - The 18th Euromicro International Conference on Parallel, Distributed and Network-Based Computing, Pisa Italy, 2010. Gustavson, 2006, Minimal data copy for dense linear algebra factorization, vol. 4699, 540 G.W. Stewart, Matrix algorithms, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2001. Buttari, 2009, A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computation, 35, 38, 10.1016/j.parco.2008.10.002 Buttari, 2008, Parallel tiled QR factorization for multicore architectures, Concurrency Computation: Practice and Experience, 20, 1573, 10.1002/cpe.1301 Schreiber, 1991, A storage-efficient WY representation for products of householder transformations, J. Sci. Stat. Comput.***, 10, 53, 10.1137/0910005 Quintana-Ortí, 2008, Updating an LU factorization with pivoting, ACM Transactions on Mathematical Software, 35, 11, 10.1145/1377612.1377615 Bolze, 2006, Grid’5000: a large scale and highly reconfigurable experimental grid testbed, IJHPCA, 20, 481 Blackford, 1997, ScaLAPACK: a linear algebra library for message-passing computers Dongarra, 2003, The LINPACK benchmark: past, present and future, Concurrency and Computation: Practice and Experience, 15, 803, 10.1002/cpe.728 Choi, 1995, ScaLAPACK: a portable linear algebra library for distributed memory computers – design issues and performance, vol. 1041, 95 Q.O. Snell, A.R. Mikler, J.L. Gustafson, Netpipe: A network protocol independent performance evaluator, in: IASTED International Conference on Intelligent Information Management and Systems, 1996. J. Dongarra, P. Beckman, et al., The international exascale software project roadmap, Tech. rep., IESP, 2011, http://www.exascale.org/iesp.