Graphics processing unit (GPU) programming strategies and trends in GPU computing

Journal of Parallel and Distributed Computing - Tập 73 Số 1 - Trang 4-13 - 2013
André R. Brodtkorb1, Trond Runar Hagen2, Martin Lilleeng Sætra3
1SINTEF ICT, Department of Applied Mathematics, P.O. Box 124, Blindern, NO-0314 Oslo, Norway
2SINTEF ICT, Department of Applied Mathematics, P.O. Box 124, Blindern, NO-0314 Oslo, Norway and Centre of Mathematics for Applications, University of Oslo, P.O. Box 1053 Blindern, NO-0316 Oslo, No ...
3Centre of Mathematics for Applications, University of Oslo, P.O. Box 1053, Blindern, No-0316 Oslo, Norway

Tóm tắt

Từ khóa


Tài liệu tham khảo

Advanced micro devices, AMD Fusion family of APUs: enabling a superior, immersive PC experience, Technical report, 2010.

Amdahl, 2000

K. Asanovic, R. Bodik, B. Catanzaro, J. Gebis, P. Husbands, K. Keutzer, D. Patterson, W. Plishker, J. Shalf, S. Williams, K. Yelick, The landscape of parallel computing research: a view from Berkeley, Technical report, EECS Department, University of California, Berkeley, December 2006.

Brodtkorb, 2010, State-of-the-art in heterogeneous computing, Scientific Programming, 18, 1, 10.1155/2010/540159

Brodtkorb, 2012, Efficient shallow water simulations on GPUs: implementation, visualization, verification, and validation, Computers & Fluids, 55, 1, 10.1016/j.compfluid.2011.10.012

A. Davidson, J.D. Owens, Toward techniques for auto-tuning GPU algorithms, in: Proceedings of Para 2010: State of the Art in Scientific and Parallel Computing, 2010.

M. Harris, NVIDIA GPU computing SDK 4.1: optimizing parallel reduction in CUDA, 2011.

Harris

Intel, Intel many integrated core (Intel MIC) architecture: ISC’11 demos and performance description, Technical report, 2011.

Intel Labs, The SCC platform overview, Technical report, Intel Corporation, 2010.

Knuth, 1974, Structured programming with go to statements, Computing Surveys, 6, 261, 10.1145/356635.356640

Y. Li, J. Dongarra, S. Tomov, A note on auto-tuning gemm for GPUs, in: Proceedings of the 9th International Conference on Computational Science: Part I, 2009.

Little, 2008

Marr, 2002, Hyper-threading technology architecture and microarchitecture, Intel Technology Journal, 6, 1

P. Micikevicius, Analysis-driven performance optimization, [Conference presentation], 2010 GPU Technology Conference, Session 2012, 2010.

P. Micikevicius, Fundamental performance optimizations for GPUs, [Conference presentation], 2010 GPU Technology Conference, session 2011, 2010.

NVIDIA, NVIDIA’s next generation CUDA compute architecture: Fermi, 2010.

NVIDIA, NVIDIA CUDA programming guide 4.1, 2011.

NVIDIA, NVIDIA GeForce GTX680, Technical report, NVIDIA Corporation, 2012.

Owens, 2008, GPU computing, Proceedings of the IEEE, 96, 879, 10.1109/JPROC.2008.917757

Seiler, 2008, Larrabee: a many-core x86 architecture for visual computing, ACM Transactions on Graphics, 27, 18:1, 10.1145/1360612.1360617

G. Taylor, Energy efficient circuit design and the future of power delivery, [Conference presentation], Electrical Performance of Electronic Packaging and Systems, October 2009.

Top 500 supercomputer sites, http://www.top500.org/, November 2011.

Vangal, 2008, An 80-tile sub-100-w teraflops processor in 65-nm CMOS, Solid-State Circuits, 43, 29, 10.1109/JSSC.2007.910957