DAGuE: A generic distributed DAG engine for High Performance Computing
Tài liệu tham khảo
Bernstein, 1966, Analysis of programs for parallel processing, IEEE Transactions on Electronic Computers, EC-15, 757, 10.1109/PGEC.1966.264565
E.G. Coffman, Jr., P.J. Denning, Operating Systems Theory, Prentice Hall Professional Technical Reference, 1973.
1992
J. Yu, R. Buyya, A taxonomy of workflow management systems for grid computing, Tech. rep., Journal of Grid Computing, 2005.
O. Delannoy, N. Emad, S. Petiton, Workflow global computing with YML, in: 7th IEEE/ACM International Conference on Grid Computing, 2006.
Buttari, 2006, The impact of multicore on math software, vol. 4699, 1
Chan, 2008, Supermatrix: a multithreaded runtime scheduling system for algorithms-by-blocks, 123
E. Agullo, J. Demmel, J. Dongarra, B. Hadri, J. Kurzak, J. Langou, H. Ltaief, P. Luszczek, S. Tomov, Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, Journal of Physics: Conference Series 180.
R. Dolbeau, S. Bihan, F. Bodin, HMPP: A hybrid multi-core parallel programming environment, in: Workshop on General Purpose Processing on Graphics Processing Units (GPGPU 2007), 2007.
Augonnet, 2011, StarPU: a unified platform for task scheduling on heterogeneous multicore architectures, Concurrency and Computation: Practice and Experience, 23, 187, 10.1002/cpe.1631
J. Perez, R. Badia, J. Labarta, A dependency-aware task-based programming environment for multi-core architectures, in: IEEE International Conference on Cluster Computing, 2008, pp. 142–151.
Song, 2009, Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems, 1
C. Augonnet, S. Thibault, R. Namyst, P.-A. Wacrenier, StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures, in: Euro-Par 2009 Euro-par’09 Proceedings, LNCS, Delft Pays-Bas, 2009.
Cosnard, 2001, Automatic parallelization techniques based on compact DAG extraction and symbolic scheduling, Parallel Processing Letters, 11, 151, 10.1142/S012962640100049X
Cosnard, 2004, Compact DAG representation and its symbolic scheduling, Journal of Parallel and Distributed Computing, 64, 921, 10.1016/j.jpdc.2004.05.001
E. Jeannot, Automatic multithreaded parallel program generation for message passing multiprocessors using parameterized task graphs, in: International Conference ‘Parallel Computing 2001’ (ParCo2001), 2001.
Husbands, 2007, Multi-threading and one-sided communication in parallel lu factorization
Gustavson, 2009, Distributed SBP cholesky factorization algorithms with near-optimal scheduling, ACM Transactions on Mathematical Software, 36, 1, 10.1145/1499096.1499100
W. Pugh, The omega test: a fast and practical integer programming algorithm for dependence analysis, in: Supercomputing ’91: Proceedings of the 1991 ACM/IEEE Conference on Supercomputing, New York, NY, USA, 1991, pp. 4–13.
U.A. Acar, G.E. Blelloch, R.D. Blumofe, The data locality of work stealing., in: SPAA’00, 2000, pp. 1–12.
F. Broquedis, J. Clet Ortega, S. Moreaud, N. Furmento, B. Goglin, G. Mercier, S. Thibault, R. Namyst, hwloc: a Generic Framework for Managing Hardware Affinities in HPC Applications, in: IEEE (Ed.), PDP 2010 - The 18th Euromicro International Conference on Parallel, Distributed and Network-Based Computing, Pisa Italy, 2010.
Gustavson, 2006, Minimal data copy for dense linear algebra factorization, vol. 4699, 540
G.W. Stewart, Matrix algorithms, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2001.
Buttari, 2009, A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computation, 35, 38, 10.1016/j.parco.2008.10.002
Buttari, 2008, Parallel tiled QR factorization for multicore architectures, Concurrency Computation: Practice and Experience, 20, 1573, 10.1002/cpe.1301
Schreiber, 1991, A storage-efficient WY representation for products of householder transformations, J. Sci. Stat. Comput.***, 10, 53, 10.1137/0910005
Quintana-Ortí, 2008, Updating an LU factorization with pivoting, ACM Transactions on Mathematical Software, 35, 11, 10.1145/1377612.1377615
Bolze, 2006, Grid’5000: a large scale and highly reconfigurable experimental grid testbed, IJHPCA, 20, 481
Blackford, 1997, ScaLAPACK: a linear algebra library for message-passing computers
Dongarra, 2003, The LINPACK benchmark: past, present and future, Concurrency and Computation: Practice and Experience, 15, 803, 10.1002/cpe.728
Choi, 1995, ScaLAPACK: a portable linear algebra library for distributed memory computers – design issues and performance, vol. 1041, 95
Q.O. Snell, A.R. Mikler, J.L. Gustafson, Netpipe: A network protocol independent performance evaluator, in: IASTED International Conference on Intelligent Information Management and Systems, 1996.
J. Dongarra, P. Beckman, et al., The international exascale software project roadmap, Tech. rep., IESP, 2011, http://www.exascale.org/iesp.
