Multiobjective evaluation and optimization of CMT-bone on multiple CPU/GPU systems
Tài liệu tham khảo
DOE, 2009
Banerjee, 2016, CMT-bone – a proxy application for compressible multiphase turbulent flows, IEEE International Conference on High Performance Computing
Gadou, 2016, Multiobjective optimization of CMT-bone on hybrid processors, IEEE International Green and Sustainable Computing
Banerjee, 2016, Performance and energy benchmarking of spectral solvers on hybridmulticore machines, Sustain. Comput. Inform. Syst.
Ou, 1995, Architecture-independent locality-improving transformations of computational graphs embedded in k-dimensions, Proceedings of the 9th International Conference on Supercomputing, 289
Al-Furajh, 2000, Parallel construction of multidimensional binary search trees, IEEE Trans. Parallel Distrib. Syst., 11, 136, 10.1109/71.841750
Ranka, 1995, Many-to-many personalized communication with bounded traffic, Proceedings of Frontiers of Massively Parallel Computation
Choudhary, 1992, Software support for irregular and loosely synchronous problems, Comput. Syst. Eng., 3
Intel, 2011
Kaddoura, 1997, Runtime support for parallelization of data-parallel applications on adaptive and nonuniform computational environments, J. Parallel Distrib. Comput., 43, 163, 10.1006/jpdc.1997.1340
Garcia, 2014, On optimization techniques for the matrix multiplication on hybrid CPU+GPU platforms, Ann. Multicore GPU Program., 1, 1
Gottlieb, 1998, Total variation-diminishing Runge-Kutta schemes, Math. Comput., 67, 73, 10.1090/S0025-5718-98-00913-2
Song, 2012, Enabling and scaling matrix computations on heterogeneous multi-core and multi-GPU systems, ACM/IEEE Conference on Supercomputing, 10.1145/2304576.2304625
Dong, 2014, A step towards energy efficient computing: redesigning a hydrodynamic application on CPU-GPU, IPDPS
Fischer, 2005, Hybrid Schwarz-multigrid methods for the spectral element method: extensions to Navier–Stokes, vol. 40, 35
Jacobsen, 2010, An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters, 48th AIAA Aerospace Sciences Meeting and Exhibit
Cecka, 2011, Assembly of finite element methods on graphics processors, Numer. Methods Eng., 85
Liou, 1996, vol. 129
Bolz, 2003, Sparse matrix solvers on the GPU: conjugate gradients and multigrid, ACM Trans. Graph., 10.1145/882262.882364
Wang, 2011, Large scale plane wave pseudopotential density functional theory calculations on GPU clusters, SC
Woolley, 2013
Jhurani, 2015, A GEMM interface and implementation on NVIDIA GPUs for multiple small matrices, J. Parallel Distrib. Comput., 75, 133, 10.1016/j.jpdc.2014.09.003
Basic linear algebra subroutines (cublas) library, CUDA NVIDIA (2013).