A survey on techniques for cooperative CPU-GPU computing

Sustainable Computing: Informatics and Systems - Tập 19 - Trang 72-85 - 2018

K. V. S. V. N. Raju¹, Niranjan N. Chiplunkar¹

¹Dept. of CS& E, NMAMIT, Nitte, Karkala, Karnataka, India

Tóm tắt

Từ khóa

Tài liệu tham khảo

Lee, 2014, Boosting CUDA applications with CPU-GPU hybrid computing, Int. J. Parallel Program., 42, 384, 10.1007/s10766-013-0252-y

Pandit, 2014, Fluidic kernels: cooperative execution of OpenCL programs on multiple heterogeneous devices, Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, 273, 10.1145/2544137.2544163

Lee, 2015, SKMD: single kernel on multiple devices for transparent CPU-GPU collaboration, ACM Trans. Comput. Syst. (TOCS), 33, 9, 10.1145/2798725

Piao, 2015, JAWS: a JavaScript framework for adaptive CPU-GPU work sharing, ACM SIGPLAN Notices, 50, 251, 10.1145/2858788.2688525

Tomov, 2010, Dense linear algebra solvers for multicore with GPU accelerators, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 1

Agulleiro, 2012, Hybrid computing: CPU+ GPU co-processing and its application to tomographic reconstruction, J. Ultramicroscopy, 115, 109, 10.1016/j.ultramic.2012.02.003

Song, 2012, Enabling and scaling matrix computations on heterogeneous multi-core and multi-GPU systems, Proceedings of the 26th ACM International Conference on Supercomputing, 365, 10.1145/2304576.2304625

Xu, 2012, Discrete particle simulation of gas-solid two-phase flows with multi-scale CPU-GPU hybrid computation, Chem. Eng. J., 207, 746, 10.1016/j.cej.2012.07.049

Teodoro, 2013, Efficient irregular wavefront propagation algorithms on hybrid CPU–GPU machines, Parallel Comput., 39, 189, 10.1016/j.parco.2013.03.001

Papadrakakis, 2011, A new era in scientific computing: domain decomposition methods in hybrid CPU–GPU architectures, Comput. Methods Appl. Mech. Eng., 200, 1490, 10.1016/j.cma.2011.01.013

Chakroun, 2013, Combining multi-core and GPU computing for solving combinatorial optimization problems, J. Parallel Distrib. Comput., 73, 1563, 10.1016/j.jpdc.2013.07.023

Chen, 2014, Adaptive block size for dense QR factorization in hybrid CPU-GPU systems via statistical modeling, Parallel Comput., 40, 70, 10.1016/j.parco.2014.03.001

Zhang, 2015, Accelerating aerial image simulation using improved CPU/GPU collaborative computing, Comput. Electr. Eng., 46, 176, 10.1016/j.compeleceng.2015.05.018

Wan, 2016, Efficient CPU-GPU cooperative computing for solving the subset-sum problem, Concurr. Comput. Pract. Exp., 28, 492, 10.1002/cpe.3629

Yao, 2016, STEM image simulation with hybrid CPU/GPU programming, Ultramicroscopy, 166, 1, 10.1016/j.ultramic.2016.04.001

Liu, 2016, Hybrid CPU-GPU scheduling and execution of tree traversals, Proceedings of the 2016 International Conference on Supercomputing, 2

Antoniadis, 2017, A hybrid CPU-GPU parallelization scheme of variable neighborhood search for inventory optimization problems, Electron. Notes Discret. Math., 58, 47, 10.1016/j.endm.2017.03.007

Wende, 2012, On improving the performance of multi-threaded CUDA applications with concurrent kernel execution by kernel reordering, 2012 Symposium on Application Accelerators in High Performance Computing (SAAHPC), 74, 10.1109/SAAHPC.2012.12

Auerbach, 2012, A compiler and runtime for heterogeneous computing, Proceedings of the 49th Annual Design Automation Conference, 271, 10.1145/2228360.2228411

Robson, 2016, Runtime coordinated heterogeneous tasks in charm++, Proceedings of the Second International Workshop on Extreme Scale Programming Models and Middleware, 40

Huang, 2012, A CPU-GPGPU scheduler based on data transmission bandwidth of workload, 2012 13th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), 610

Boyer, 2013, Improving GPU performance prediction with data transfer modeling, 2013 IEEE 27th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 1097, 10.1109/IPDPSW.2013.236

Mokhtari, 2014, BigKernel--high performance CPU-GPU communication pipelining for big data-style applications, 2014 IEEE 28th International Parallel and Distributed Processing Symposium, 819, 10.1109/IPDPS.2014.89

Sunitha, 2017, Performance improvement of CUDA applications by reducing CPU-GPU data transfer overhead, 2017 International Conference on Inventive Communication and Computational Technologies (ICICCT), 211, 10.1109/ICICCT.2017.7975190

Lázaro-Muñoz, 2017, A tasks reordering model to reduce transfers overhead on GPUs, J. Parallel Distrib. Comput., 109, 258, 10.1016/j.jpdc.2017.06.015

Stratton, 2008, MCUDA: an efficient implementation of CUDA kernels for multi-core CPUs, LCPC, 2008, 16

Diamos, 2008, Harmony: an execution model and runtime for heterogeneous many core systems, Proceedings of the 17th International Symposium on High Performance Distributed Computing, 197, 10.1145/1383422.1383447

Papakonstantinou, 2009, FCUDA: enabling efficient compilation of CUDA kernels onto FPGAs, 2009 IEEE 7th Symposium on Application Specific Processors, SASP’09, 35, 10.1109/SASP.2009.5226333

Diamos, 2010, Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems, Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 353, 10.1145/1854273.1854318

Gummaraju, 2010, Twin peaks: a software platform for heterogeneous computing on general-purpose and graphics processors, Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 205, 10.1145/1854273.1854302

Hong, 2010, MapCG: writing parallel program portable between CPU and GPU, Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques, 217, 10.1145/1854273.1854303

Augonnet, 2011, StarPU: a unified platform for task scheduling on heterogeneous multicore architectures, Concurr. Comput. Pract. Exp., 23, 187, 10.1002/cpe.1631

Wang, 2008, Task scheduling of parallel processing in CPU-GPU collaborative environment, International Conference on Computer Science and Information Technology, 2008. ICCSIT'08, 228, 10.1109/ICCSIT.2008.27

Luk, 2009, Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping, Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, 45, 10.1145/1669112.1669121

Jiménez, 2009, Predictive runtime code scheduling for heterogeneous architectures, HiPEAC, 9, 19

Gregg, 2012, Fine-grained Resource sharing for concurrent GPGPU kernels, Proceedings of the 4th USENIX Conference on Hot Topics in Parallelism (HotPar'12)

Zhong, 2012, Data partitioning on heterogeneous multicore and multi-GPU systems using functional performance models of data-parallel applications, 2012 IEEE International Conference on Cluster Computing (CLUSTER), 191, 10.1109/CLUSTER.2012.34

Sun, 2012, Enabling task-level scheduling on heterogeneous platforms, Proceedings of the 5th Annual Workshop on General Purpose Processing With Graphics Processing Units, 84, 10.1145/2159430.2159440

Grasso, 2013, Automatic problem size sensitive task partitioning on heterogeneous parallel systems, ACM SIGPLAN Notices, 48, 281, 10.1145/2517327.2442545

Zhong, 2014, Kernelet: high-throughput GPU kernel executions with dynamic slicing and scheduling, IEEE Trans. Parallel Distrib. Syst., 25, 1522, 10.1109/TPDS.2013.257

Yao, 2013, Partition strategies for C source programs to support CPU+GPU coordination computing, International Conference on Information Science and Cloud Computing, 39

Aciu, 2013, Algorithm for cooperative CPU-GPU computing, 15th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), 352

Wen, 2014, Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms, 21st International Conference on High Performance Computing (HiPC), 1

Li, 2014, Symbiotic scheduling of concurrent GPU kernels for performance and energy optimizations, Proceedings of the 11th ACM Conference on Computing Frontiers, 36

Vilches, 2015, Adaptive partitioning for irregular applications on heterogeneous CPU-GPU chips, Procedia Comput. Sci., 51, 140, 10.1016/j.procs.2015.05.213

Wang, 2016, Performance Optimization for CPU-GPU Heterogeneous Parallel System, 2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), 1259, 10.1109/FSKD.2016.7603359

Wang, 2016, A user mode CPU-GPU scheduling framework for hybrid workloads, Future Gener. Comput. Syst., 63, 25, 10.1016/j.future.2016.03.011

Wang, 2010, Power-efficient work distribution method for CPU-GPU heterogeneous system, 2010 International Symposium on Parallel and Distributed Processing With Applications (ISPA), 122, 10.1109/ISPA.2010.22

Ge, 2014, PEACH: a model for performance and energy aware cooperative hybrid computing, Proceedings of the 11th ACM Conference on Computing Frontiers, 24

Lang, 2014, An execution time and energy model for an energy-aware execution of a conjugate gradient method with CPU/GPU collaboration, J. Parallel Distrib. Comput., 74, 2884, 10.1016/j.jpdc.2014.06.001

Ma, 2016, Energy conservation for GPU-CPU architectures with dynamic workload division and frequency scaling, Sustain. Comput. Inform. Syst., 12, 21

Siehl, 2016, Power-aware heterogeneous computing through CPU-GPU hybridization, Energy, 20, 60

Chau, 2017, Energy efficient job scheduling with DVFS for CPU-GPU heterogeneous systems, Proceedings of the Eighth International Conference on Future Energy Systems, 1

Gong, 2017, Cooperative DVFS for energy-efficient HEVC decoding on embedded CPU-GPU architecture, Proceedings of the 54th Annual Design Automation Conference, 42

Kale, 1993, CHARM++: a portable concurrent object oriented system based on C++, ACM Sigplan Notices, 28, 91, 10.1145/167962.165874

Lattner, 2004, LLVM: a compilation framework for lifelong program analysis & transformation, Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization, 75, 10.1109/CGO.2004.1281665

Stafford, 2017, To distribute or not to distribute: the question of load balancing for performance or energy, European Conference on Parallel Processing, 710

Daga, 2011, On the efficacy of a fused CPU+ GPU processor (or APU) for parallel computing, 2011 Symposium on Application Accelerators in High-Performance Computing (SAAHPC), 141, 10.1109/SAAHPC.2011.29

Lee, 2013, Performance characterization of data-intensive kernels on AMD fusion architectures, Computer Science-Research and Development, 28, 175, 10.1007/s00450-012-0209-1

Spafford, 2012, The tradeoffs of fused memory hierarchies in heterogeneous computing architectures, Proceedings of the 9th Conference on Computing Frontiers, 103, 10.1145/2212908.2212924

Said, 2016, On the efficiency of the accelerated processing unit for scientific computing, Proceedings of the 24th High Performance Computing Symposium, 25

Dashti, 2017, Analyzing memory management methods on integrated CPU-GPU systems, Proceedings of the 2017 ACM SIGPLAN International Symposium on Memory Management, 59, 10.1145/3092255.3092256

Daga, 2012, Exploiting coarse-grained parallelism in b+ tree searches on an apu, High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion, 240, 10.1109/SC.Companion.2012.40

Gu, 2014, Implementation and evaluation of deep neural networks (DNN) on mainstream heterogeneous systems, Proceedings of 5th Asia-Pacific Workshop on Systems, 12

Wyrzykowski, 2013, Efficient execution of erasure codes on AMD APU architecture, International Conference on Parallel Processing and Applied Mathematics, 613

Delorme, 2013, Parallel radix sort on the AMD fusion accelerated processing unit, 2013 42nd International Conference on Parallel Processing (ICPP), 339, 10.1109/ICPP.2013.43

He, 2013, Revisiting co-processing for hash joins on the coupled CPU-GPU architecture, Proceedings of the VLDB Endowment, 6, 889, 10.14778/2536206.2536216

He, 2014, In-cache query co-processing on coupled CPU-GPU architectures, Proceedings of the VLDB Endowment, 8, 329, 10.14778/2735496.2735497

Eberhart, 2014, Hybrid strategy for stencil computations on the APU, Proceedings of the 1st International Workshop on High-Performance Stencil Computations, 43

Cheng, 2015, Energy-efficient query processing on embedded CPU-GPU architectures, Proceedings of the 11th International Workshop on Data Management on New Hardware, 10

Zhang, 2017, Understanding co-running behaviors on integrated CPU/GPU architectures, Ieee Trans. Parallel Distrib. Syst., 28, 905, 10.1109/TPDS.2016.2586074

Lupescu, 2017, Using the integrated GPU to improve CPU sort performance, 2017 46th International Conference on Parallel Processing Workshops (ICPPW), 39, 10.1109/ICPPW.2017.19

Zhang, 2017, FinePar: irregularity-aware fine-grained workload partitioning on integrated architectures, 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), 27, 10.1109/CGO.2017.7863726

Zhu, 2017, Co-run scheduling with power cap on integrated CPU-GPU systems, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 967, 10.1109/IPDPS.2017.124

Fang, 2017, Understanding data partition for applications on CPU-GPU integrated processors, International Conference on Mobile Ad-Hoc and Sensor Networks, 426

Mittal, 2015, A survey of CPU-GPU heterogeneous computing techniques, ACM Computing Surveys (CSUR), 47, 69, 10.1145/2788396

Insieme compiler and runtime infrastructure. Distributed and Parallel Systems Group, 2012. University of Innsbruck. URL http://insieme-compiler.org.

Web Worker. URL http://www.w3.org/TR/workers.

WebCL Standard. URL www.khronos.org/webcl/.

CUDA C Programming Guide, Version 8.0, Nvidia Corporation (2017). URL www.nvidia.com.

OpenCL Programming User Guide, rev 1.0, Advanced Micro Devices, Inc. (2013). URL www.amd.com.

OpenMP Application Program Interface, Version 4.0 (2013). URL www.openmp.org.

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA