Thermal-constrained memory management for three-dimensional DRAM-PCM memory with deep neural network applications

Microprocessors and Microsystems - Tập 89 - Trang 104444 - 2022
Shu-Yen Lin1, Shao-Cheng Wang1
1Department of Electrical Engineering, Yuan Ze University, 135 Yuantung Road, Chung-Li, Taiwan

Tài liệu tham khảo

Godse, 2018, Memory technology enabling the next artificial intelligence revolution, 1 Syed, 2019, Refresh triggered computation: improving the energy efficiency of convolutional neural network accelerators, ArXiv, 1 Bianchi, 2019, Energy-efficient continual learning in hybrid supervised-unsupervised neural networks with PCM synapses, T172 Ramos, 2011, Page placement in hybrid memory systems, 85 Chen, 2014, A novel page replacement algorithm for the hybrid memory architecture involving PCM and DRAM, 108 Han, 2016, Efficient inference engine on compressed deep neural network, 243 Han, 2015, Learning both weights and connections for efficient neural networks, Adv. Neural Inf. Process. Syst., 28, 1 Lu, 2012, Scaling the memory wall, 271 Liu, 2005, Bridging the processor-memory performance gap with 3D IC technology, IEEE Des. Test Comput., 22, 556, 10.1109/MDT.2005.134 Lam, 2008, Cell design considerations for phase change memory as a universal memory, 132 Gill, 2002, Ovonic unified memory - a high-performance nonvolatile memory technology for stand-alone memory and embedded applications, 202 Burcin, 2005, A 4-Mbit non-volatile chalcogenide random access memory, 1 Campoy, 2003, Static use of locking caches in multitask preemptive real-time systems, 1283 Zhou, 2005, Improving database performance on simultaneous multithreading processors, 49 R. Wang, L. Jiang, Y. Zhang and J. Yang, “SD-PCM: constructing reliable super dense phase change memory under write disturbance,” in ACM SIGPLAN Notices, 2015, pp. 19–31. Zhou, 2009, A durable and energy efficient main memory using phase change memory technology, 14 Ji, 2006, Optimization of memory management for H.264/AVC decoder, 65 Tanabe, 2011, Scaleable sparse matrix-vector multiplication with functional memory and GPUs, 102 International technology roadmap for semiconductors emerging research devices, 2007. Eilert, 2009, Phase change memory: a new memory technology to enable new memory usage models, IEEE Int. Memory Workshop, 1 Pourshirazi, 2016, Refree: a refresh-free hybrid DRAM/PCM main memory system, 566 EDA-Cloud 2021.[Online] Available:. Nanya technology corporation DDR3 simulation model 2021.[Online] available: http://www.nanya.com/tw/Product/List/450/2249. Pandey, 2019, Access-aware self-adaptive data mapping onto 3D-stacked hybrid DRAM-PCM based chip-multiprocessor, 389 TechInsights memory technology update from IEDM18 2021. [Online] available: https://www.techinsights.com/blog/techinsights-memory-technology-update-iedm18. Poremba, 2015, NVMain 2.0: a User-Friendly Memory Simulator to Model (Non-)Volatile Memory Systems, IEEE Comput. Archit. Lett., 14, 140, 10.1109/LCA.2015.2402435 Choi, 2012, A 20nm 1.8V 8Gb PRAM with 40MB/s program bandwidth, 46 Zhang, 2009, Exploring phase change memory and 3D die-stacking for power/thermal friendly, fast and durable memory architectures, 101 Beigi, 2016, TAPAS: temperature-aware adaptive placement for 3D stacked hybrid caches, 415 Wei, 2014, HAP: hybrid-memory-aware partition in shared last-level cache, 28 Jiang, 2018, A probability-based data allocation strategy for hybrid DRAM/NVM memory in real-time embedded systems, 387 Santos, 2014, Thermal performance of 3D ICs: analysis and alternatives, 1 Pavlovic, 2013, Data Placement in HPC architectures with heterogeneous off-chip memory, 193 Hsieh, 2013, Thermal-aware memory mapping in 3D designs, ACM Trans. Embedded Comput. Syst., 1361 Liu, 2011, Hardware/software techniques for DRAM thermal management, 515 Lee, 2014, CLOCK-DWF: a write-history-aware page replacement algorithm for hybrid PCM and DRAM memory architectures, IEEE Trans. Comput., 63, 2187, 10.1109/TC.2013.98 Yoon, 2012, Row buffer locality aware caching policies for hybrid memories, 337 Wu, 2014, APP-LRU: a new page replace-ment method for PCM/DRAM-based hybrid memory systems, 84 Samsung DDR3 SDRAM - K4B2G0846D datasheet.2021 [Online] available: https://www.samsung.com/semiconductor/global.semi/file/resource/2017/11/ds_ddr3_2 gb_d-die_based_1_35v_sodimm_rev14-0.pdf. Ansari, 2018, Selective data transfer from DRAMs for CNNs, 1 Rosenfeld, 2011, DRAMSim2: a cycle accurate memory system simulator, IEEE Comput. Archit. Lett., 10, 16, 10.1109/L-CA.2011.4 Poremba, 2015, DESTINY: a tool for modeling emerging 3D NVM and eDRAM caches, 1543 Skadron, 2004, Temperature-aware microarchitecture, ACM Trans. Archit. Code Optim., 1, 94, 10.1145/980152.980157 Lin, 2017, Architectural memory co-simulation tool with floorplan, power, timing, and thermal information, 117 Lin, 2015, A buffer cache architecture for smartphones with hybrid DRAM/PCM memory, 1 Nethercote, 2003, Valgrind: a program supervision framework, Electron. Notes Theor. Comput. Sci., 89, 44, 10.1016/S1571-0661(04)81042-9 Liu, 2018, Thermal-aware memory system synthesis for MPSoCs with 3D-stacked hybrid memories, 546 Lee, 2015, M-CLOCK: migration-optimized page replacement algorithm for hybrid DRAM and PCM memory architecture, 2001 A. Krizhevsky, “One weird trick for parallelizing convolutional neural networks,” in arXiv preprint, 1404.5997, 2014. Krizhevsky, 2012, ImageNet classification with deep convolutional neural networks, Adv. Neural Inf. Process. Syst. (NIPS) Samajdar, 2018, SCALE-sim: systolic CNN accelerator simulator, arXiv Qureshi, 2009, Scalable high performance main memory system using phase-change memory technology, 24 PyTorch 2021.[Online] Available: https://pytorch.org/. Kim, 2018, Adaptive-classification CLOCK: page replacement policy based on read/write access pattern for hybrid DRAM and PCM main memory, Microprocess. Microsyst., 57, 65, 10.1016/j.micpro.2018.01.003 NIU, 2019, WIRD: an efficiency migration scheme in hybrid DRAM and PCM main memory for image processing applications, IEEE Access, 7, 35941, 10.1109/ACCESS.2019.2904803 H. Unlu, “Efficient neural network deployment for microcontroller,” in arXiv:2007.01348 2020. Y. LeCun, The MNIST database of handwritten digits, 1998. Ma, 2019, ResNet can be pruned 60×: introducing network purification and unused path removal (P-RM) after weight pruning, 2019 IEEE/ACM Int. Symp. Nanoscale Arch. (NANOARCH), 1 Li, 2016, Optimizing memory efficiency for deep convolutional neural networks on GPUs, 633 Geng, 2018, FPDeep: acceleration and load balancing of CNN training on FPGA clusters, 81 Choi, 2018, TrainWare: a memory optimized weight update architecture for on-device convolutional neural network training, 1 Peemen, 2013, Memory-centric accelerator design for convolutional neural networks, 13 Sze, 2017, Efficient processing of deep neural networks: a tutorial and survey, 105, 2295