Optimized HPL for AMD GPU and multi-core CPU usage

Computer Science - Research and Development - Tập 26 Số 3-4 - Trang 153-164 - 2011
M. Bach1, M. Kretz1, V. Lindenstruth1, D. Røhr1
1Frankfurt Institute for Advanced Studies, Ruth-Moufang-Straße 1, 60438, Frankfurt am Main, Germany

Tóm tắt

Từ khóa


Tài liệu tham khảo

Advanced Micro Devices: AMD stream computing guide. URL  http://developer.amd.com/gpu/ATIStreamSDK/assets/ATI_Stream_SDK_OpenCL_Programming_Guide.pdf

Amdahl G (1967) Validity of the single processor approach to achieving large-scale computing capabilities. In: AFIPS conference proceedings, vol 30, pp 483–485

Drepper U (2007) What every programmer should know about memory. URL  http://www.akkadia.org/drepper/cpumemory.pdf

Goethe University of Frankfurt Center for Scientific Computing: LOEWE-CSC cluster. URL  http://csc.uni-frankfurt.de/csc/?51

Intel Corporation (2009) Intel threading building blocks reference manual. URL  http://software.intel.com/sites/products/documentation/hpc/tbb/reference.pdf

Nakasato N (2010) A fast GEMM implementation on a cypress GPU. URL  http://www.dcs.warwick.ac.uk/~sdh/pmbs10/pmbs10/Workshop_Programme_files/fastgemm.pdf

NVIDIA Corporation: CUBLAS library. URL  http://developer.download.nvidia.com/compute/cuda/1_0/CUBLAS_Library_1.0.pdf

Rohr D, Kretz M, Bach M (2010) Technical report, CALDGEMM and HPL. URL  http://code.compeng.uni-frankfurt.de/attachments/10/techreport.pdf

Texas Advanced Computing Center: GotoBLAS basic linear algebra library. URL  http://www.tacc.utexas.edu/tacc-projects/

University of Tennesse: High performance Linpack algorithm. URL  http://www.netlib.org/benchmark/hpl/algorithm.html

Volkov V, Demmel J (2008) Benchmarking GPUs to tune dense linear algebra. In: SC 08 ACM/IEEE conference on supercomputing proceedings, pp 1–11