Kernel scheduling approach for reducing GPU energy consumption

Journal of Computational Science - Tập 28 - Trang 360-368 - 2018

Junke Li^1,2, Bing Guo¹, Yan Shen³, Deguang Li¹, Yanhui Huang¹

¹College of Computer Science, Sichuan University, Chengdu, China

²School of Computer and Information, Qiannan Normal University for Nationalities, Duyun, China

³School of Control Engineering, Chengdu University of Information Technology, Chengdu, China

Tài liệu tham khảo

A. Munshi, The OpenCL specification, in Khronos OpenCL Working Group, 2008. NVIDIA CUDA compute unified device architecture-programming guide. https://docs.nvidia.com/cuda/cuda-cprogramming-guide/, 2008. Yoon, 2016, Virtual Thread: maximizing thread-level parallelism beyond gpu scheduling limit, Computer Architecture (ISCA), 2016 ACM/IEEE 43rd Annual International Symposium On. IEEE, 10.1109/ISCA.2016.59 Top 500 Supercomputer Sites Webpage, November 2015. http://www.top500.org. Whitepaper NVIDIA’s Next Generation CUDA™ Compute Architecture: KeplerTM GK110, tech. rep., NVIDIA, 2012. Tanasic, 2014, Enabling preemptive multiprogramming on GPUs. computer architecture (ISCA), 2014 ACM/IEEE 43rd Annual International Symposium On. IEEE Park, 2015, Chimera: collaborative preemption for multitasking on a shared GPU, ACM SIGARCH Comput. Archit. News, 43, 593, 10.1145/2786763.2694346 Wang, 2016, Simultaneous multikernel GPU: multi-tasking throughput processors via fine-grained sharing. high performance computer architecture (HPCA), 2016 IEEE International Symposium On. IEEE Adriaens Jacob, 2012, The case for GPGPU spatial multitasking. high performance computer architecture (HPCA), 2012 IEEE 18th International Symposium on IEEE Ukidave, 2014, Runtime support for adaptive spatial partitioning and inter-kernel communication on GPUs. Computer Architecture and High Performance Computing (SBAC-PAD), 2014 IEEE 26th International Symposium On. IEEE Liang, 2015, Efficient gpu spatial-temporal multitasking, IEEE Trans. Parallel Distrib. Syst., 26, 748, 10.1109/TPDS.2014.2313342 Aguilera, 2014, Fair share: allocation of GPU resources for both performance and fairness. Computer design (ICCD), 2014 32nd IEEE International Conference on. IEEE Xu, 2016, Warped-slicer: efficient intra-SM slicing through dynamic resource partitioning for GPU multiprogramming B. Coon, J. Nickolls, J. Lindholm, R. Stoll, N. Wang, J. Choquette, K. Nickolls, Thread group scheduler for computing on a parallel thread processor, May 2012. US Patent 8732713. Guevara, 2009, Enabling task parallelism in the cuda scheduler, Workshop Programm. Models Emerg. Archit., 9 Wang, 2011, Exploiting concurrent kernel execution on graphic processing units. High performance computing and simulation (HPCS), 2011 International Conference On. IEEE Zhong, 2014, Kernelet: high-throughput GPU kernel executions with dynamic slicing and scheduling, IEEE Trans. Parallel Distrib. Syst., 25, 1522, 10.1109/TPDS.2013.257 Li, 2011, GPU resource sharing and virtualization on high performance computing systems. Parallel Processing (ICPP), 2011 International Conference On. IEEE Gregg, 2012, Fine-grained resource sharing for concurrent GPGPU kernels, HotPar Li, Teng, Vikram K. Narayana, Tarek El-Ghazawi. Reordering GPU Kernel Launches to Enable Efficient Concurrent Execution. arXiv preprint arXiv:1511.07983 (2015). Pai, 2013, Improving gpgpu concurrency with elastic kernels, In International Conference on Architectural Support for Programming Languages and Operating Systems Ravi, 2011, Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework, Proceedings of the 20th International Symposium on High Performance Distributed Computing ACM, 10.1145/1996130.1996160 Li, 2015, A power-aware symbiotic scheduling algorithm for concurrent gpu kernels. Parallel and distributed systems (ICPADS), 2015 IEEE 21 st International Conference On. IEEE Jiao, 2015, Improving GPGPU energy-efficiency through concurrent kernel execution and DVFS. code generation and optimization (CGO), 2015 IEEE/ACM International Symposium On. IEEE Li, 2011, Energy-aware workload consolidation on GPU, parallel processing workshops (ICPPW), 2011 40th International Conference On. IEEE Wang, 2010, Kernel fusion: an effective method for better power efficiency on multithreaded GPU Zhang, 2014, A cool scheduler for multi-core systems exploiting program phases, IEEE Trans. Comput., 63, 1061, 10.1109/TC.2012.283 Ryoo, 2008, Optimization principles and application performance evaluation of a multithreaded GPU using CUDA, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming ACM, 10.1145/1345206.1345220 Hong, 2009, An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness, ACM SIGARCH Comput. Archit. News, 37, 10.1145/1555815.1555775 Li, 2009, McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. Microarchitecture, 2009, 42nd Annual IEEE/ACM International Symposium on. IEEE, 10.1145/1669112.1669172

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA