A power-performance balanced network-on-chip for mixed CPU-GPU systems
Tài liệu tham khảo
Mirhosseini, 2017, Binochs: bimodal network-on-chip for cpu-gpu heterogeneous systems, 1
Sadrosadati, 2018, Ltrf: enabling highcapacity register files for gpus via hardware/software cooperative register prefetching, 489
Lee, 2013, Adaptive virtual channel partitioning for network-on-chip in heterogeneous architectures, ACM Trans. Des. Autom. Electron. Syst. (TODAES), 18, 1
Mirhosseini, 2016, Quantifying the difference in resource demand among classic and modern noc workloads, 404
Zhan, 2016, Oscar: Orchestrating stt-ram cache traffic for heterogeneous cpu-gpu architectures, 1
Mirhosseini, 2019, Baran: bimodal adaptive reconfigurable-allocator network-on-chip, ACM Trans. Parallel Comput., 5, 10.1145/3294049
Jang, 2015, Bandwidth-efficient on-chip interconnect designs for gpgpus, 1
Bogdan, 2010, Workload characterization and its impact on multicore platform design, 231
Vijaykumar, 2015, A case for core-assisted bottleneck acceleration in gpus: enabling flexible data compression with assist warps, 41
Kayiran, 2014, Managing gpu concurrency in heterogeneous architectures, 114
A. Mirhosseini, M. Sadrosadati, A. Fakhrzadehgan, M. Modarressi, and H. Sarbazi-Azad, “An energy-efficient virtual channel power-gating mechanism for on-chip networks,” in Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition. EDA Consortium, 2015, pp. 1527–1532.
Mehrvarzy, 2016, Power-and performance-efficient cluster-based network-on-chip with reconfigurable topology, Microprocess. Microsyst., 46, 122, 10.1016/j.micpro.2016.03.004
Jalili, 2016, Power-efficient partially-adaptive routing in on-chip mesh networks, 65
Teimouri, 2013, Power and performance efficient partial circuits in packet-switched networks-on-chip, 509
Rahmati, 2012, Power-efficient deterministic and adaptive routing in torus networks-on-chip, Microprocess. Microsyst., 36, 571, 10.1016/j.micpro.2011.05.009
Sabbaghi-Nadooshan, 2010, The 2d sem: a novel high-performance and low-power mesh-based topology for networks-on-chip, Int. J. Parallel Emerg. Distrib. Syst., 25, 331, 10.1080/17445760902894712
Arjomand, 2010, Power-performance analysis of networks-on-chip with arbitrary buffer allocation schemes, IEEE Trans. Comput. Aid. Des. Integ. Circuits Syst., 29, 1558, 10.1109/TCAD.2010.2061171
Arjomand, 2010, Voltage-frequency planning for thermal-aware, low-power design of regular 3-d nocs, 57
Arjomand, 2009, A comprehensive power-performance model for nocs with multi-flit channel buffers, 470
Modarressi, 2009, Performance and power efficient on-chip communication using adaptive virtual point-to-point connections, 203
Sabbaghi-Nadooshan, 2008, A novel high-performance and low-power mesh-based noc, 1
Modarressi, 2007, Power-aware mapping for reconfigurable noc architectures, 417
Sadrosadati, 2015, An efficient dvs scheme for on-chip networks using reconfigurable virtual channel allocators, 249
Yin, 2012, Energy-efficient non-minimal path on-chip interconnection network for heterogeneous systems, 57
Matsutani, 2012, A multi-vdd dynamic variable-pipeline on-chip router for cmps, 407
Dreslinski, 2010, Near-threshold computing: reclaiming moore's law through energy efficient integrated circuits, Proc. IEEE, 98, 253, 10.1109/JPROC.2009.2034764
Mishra, 2013, A heterogeneous multiple network-on-chip design: an application-aware approach, 1
Che, 2009, Rodinia: a benchmark suite for heterogeneous computing, 44
Gopireddy, 2016, Scalcore: designing a core for voltage scalability, 681
Jan, 2012, A 22nm soc platform technology featuring 3-d tri-gate and high-k/metal gate, optimized for ultra low power, high performance and high density soc applications, 1
Dimitrakopoulos, 2008, Fast arbiters for on-chip network switches, 664
Evripidou, 2012, Virtualizing virtual channels for increased network-on-chip robustness and upgradeability, 21
Gratz, 2008, Regional congestion awareness for load balance in networks-on-chip, 203
Chiu, 2000, The odd-even turn model for adaptive routing, IEEE Trans. Parallel Distrib. Syst., 11, 729, 10.1109/71.877831
Dally, 1993, Deadlock-free adaptive routing in multicomputer networks using virtual channels, IEEE Trans. Parallel Distrib. Syst., 4, 466, 10.1109/71.219761
Valiant, 1982, A scheme for fast parallel communication, SIAM J. Comput., 11, 350, 10.1137/0211027
Leng, 2013, Gpuwattch: enabling energy optimizations in gpgpus, 487
Sethia, 2014, Equalizer: dynamic tuning of gpu resources for efficient execution, 647
Kim, 2008, System level analysis of fast, per-core dvfs using on-chip switching regulators, 123
Godycki, 2014, Enabling realistic fine-grain voltage scaling with reconfigurable power distribution networks, 381
Ausavarungnirun, 2012, Staged memory scheduling: achieving high performance and scalability in heterogeneous systems, 416
Bakhoda, 2009, Analyzing cuda workloads using a detailed gpu simulator, 163
Jiang, 2013, A detailed and flexible cycle-accurate network-on-chip simulator, 86
Sadrosadati, 2017, Effective cache bank placement for gpus, 2017, 31
Abts, 2009, Achieving predictable performance through better memory controller placement in many-core CMPs, ACM SIGARCH Comput. Archit. News, 37, 451, 10.1145/1555815.1555810
Lee, 2012, Tap: A tlp-aware cache management policy for a cpu-gpu heterogeneous architecture, 1
Jeong, 2012, A qos-aware memory controller for dynamically balancing gpu and cpu bandwidth use in an mpsoc, 850
Jog, 2014, Application-aware memory system for fair and efficient execution of concurrent gpgpu applications, 1
Jog, 2013, Orchestrated scheduling and prefetching for gpgpus, 332
A. Jog, O. Kayiran, N. Chidambaram Nachiappan, A. K. Mishra, M. T. Kandemir, O. Mutlu, R. Iyer, and C. R. Das, “Owl: cooperative thread array aware scheduling techniques for improving gpgpu performance,” in ACM SIGPLAN Not., vol. 48, no. 4. ACM, 2013, pp. 395–406.
Kayiran, 2013, Neither more nor less: optimizing thread-level parallelism for gpgpus, 157
Rogers, 2012, Cache-conscious wavefront scheduling, 72
Das, 2010, Aergia: exploiting packet latency slack in on-chip networks, 106
Kumar, 2007, Express virtual channels: towards the ideal interconnection fabric, ACM SIGARCH Comput. Archit. News, 35, 150, 10.1145/1273440.1250681
Nicopoulos, 2006, Vichar: a dynamic virtual channel regulator for network-on-chip routers, 333
Choi, 2016, Hybrid network-on-chip architectures for accelerating deep learning kernels on heterogeneous manycore platforms, 1
Abadal, 2016, Wisync: an architecture for fast synchronization through on-chip wireless communication, ACM SIGPLAN Not., 51, 3, 10.1145/2954679.2872396
Sadrosadati, 2019, Itap: idletime-aware power management for gpu execution units, ACM Trans. Archit. Code Optim., 16, 10.1145/3291606
Miller, 2012, Booster: reactive core acceleration for mitigating the effects of process variation and application imbalance in low-voltage chips, 1
Dreslinski, 2007, An energy efficient parallel architecture using near threshold operation, 175
Aghilinasab, 2016, Reducing power consumption of gpgpus through instruction reordering, 356
Ruhl, 2013, Ia-32 Processor with a wide-voltage-operating range in 32-nm cmos, IEEE Micro, 33, 28, 10.1109/MM.2013.8