Kernel scheduling approach for reducing GPU energy consumption
Tài liệu tham khảo
A. Munshi, The OpenCL specification, in Khronos OpenCL Working Group, 2008.
NVIDIA CUDA compute unified device architecture-programming guide. https://docs.nvidia.com/cuda/cuda-cprogramming-guide/, 2008.
Yoon, 2016, Virtual Thread: maximizing thread-level parallelism beyond gpu scheduling limit, Computer Architecture (ISCA), 2016 ACM/IEEE 43rd Annual International Symposium On. IEEE, 10.1109/ISCA.2016.59
Top 500 Supercomputer Sites Webpage, November 2015. http://www.top500.org.
Whitepaper NVIDIA’s Next Generation CUDA™ Compute Architecture: KeplerTM GK110, tech. rep., NVIDIA, 2012.
Tanasic, 2014, Enabling preemptive multiprogramming on GPUs. computer architecture (ISCA), 2014 ACM/IEEE 43rd Annual International Symposium On. IEEE
Park, 2015, Chimera: collaborative preemption for multitasking on a shared GPU, ACM SIGARCH Comput. Archit. News, 43, 593, 10.1145/2786763.2694346
Wang, 2016, Simultaneous multikernel GPU: multi-tasking throughput processors via fine-grained sharing. high performance computer architecture (HPCA), 2016 IEEE International Symposium On. IEEE
Adriaens Jacob, 2012, The case for GPGPU spatial multitasking. high performance computer architecture (HPCA), 2012 IEEE 18th International Symposium on IEEE
Ukidave, 2014, Runtime support for adaptive spatial partitioning and inter-kernel communication on GPUs. Computer Architecture and High Performance Computing (SBAC-PAD), 2014 IEEE 26th International Symposium On. IEEE
Liang, 2015, Efficient gpu spatial-temporal multitasking, IEEE Trans. Parallel Distrib. Syst., 26, 748, 10.1109/TPDS.2014.2313342
Aguilera, 2014, Fair share: allocation of GPU resources for both performance and fairness. Computer design (ICCD), 2014 32nd IEEE International Conference on. IEEE
Xu, 2016, Warped-slicer: efficient intra-SM slicing through dynamic resource partitioning for GPU multiprogramming
B. Coon, J. Nickolls, J. Lindholm, R. Stoll, N. Wang, J. Choquette, K. Nickolls, Thread group scheduler for computing on a parallel thread processor, May 2012. US Patent 8732713.
Guevara, 2009, Enabling task parallelism in the cuda scheduler, Workshop Programm. Models Emerg. Archit., 9
Wang, 2011, Exploiting concurrent kernel execution on graphic processing units. High performance computing and simulation (HPCS), 2011 International Conference On. IEEE
Zhong, 2014, Kernelet: high-throughput GPU kernel executions with dynamic slicing and scheduling, IEEE Trans. Parallel Distrib. Syst., 25, 1522, 10.1109/TPDS.2013.257
Li, 2011, GPU resource sharing and virtualization on high performance computing systems. Parallel Processing (ICPP), 2011 International Conference On. IEEE
Gregg, 2012, Fine-grained resource sharing for concurrent GPGPU kernels, HotPar
Li, Teng, Vikram K. Narayana, Tarek El-Ghazawi. Reordering GPU Kernel Launches to Enable Efficient Concurrent Execution. arXiv preprint arXiv:1511.07983 (2015).
Pai, 2013, Improving gpgpu concurrency with elastic kernels, In International Conference on Architectural Support for Programming Languages and Operating Systems
Ravi, 2011, Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework, Proceedings of the 20th International Symposium on High Performance Distributed Computing ACM, 10.1145/1996130.1996160
Li, 2015, A power-aware symbiotic scheduling algorithm for concurrent gpu kernels. Parallel and distributed systems (ICPADS), 2015 IEEE 21 st International Conference On. IEEE
Jiao, 2015, Improving GPGPU energy-efficiency through concurrent kernel execution and DVFS. code generation and optimization (CGO), 2015 IEEE/ACM International Symposium On. IEEE
Li, 2011, Energy-aware workload consolidation on GPU, parallel processing workshops (ICPPW), 2011 40th International Conference On. IEEE
Wang, 2010, Kernel fusion: an effective method for better power efficiency on multithreaded GPU
Zhang, 2014, A cool scheduler for multi-core systems exploiting program phases, IEEE Trans. Comput., 63, 1061, 10.1109/TC.2012.283
Ryoo, 2008, Optimization principles and application performance evaluation of a multithreaded GPU using CUDA, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming ACM, 10.1145/1345206.1345220
Hong, 2009, An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness, ACM SIGARCH Comput. Archit. News, 37, 10.1145/1555815.1555775
Li, 2009, McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. Microarchitecture, 2009, 42nd Annual IEEE/ACM International Symposium on. IEEE, 10.1145/1669112.1669172