A prefetch control strategy based on improved hill-climbing method in asymmetric multi-core architecture
Tóm tắt
Cache prefetching is a traditional way to reduce memory access latency. In multi-core systems, aggressive prefetching may harm the system. In the past, prefetching throttling strategies usually set thresholds through certain factors. When the threshold is exceeded, prefetch throttling strategies will control the aggressive prefetcher. However, these strategies usually work well in homogeneous multi-core systems and do not work well in heterogeneous multi-core systems. This paper considers the performance difference between cores under the asymmetric multi-core architecture. Through the improved hill-climbing method, the aggressiveness of prefetching for different cores is controlled, and the IPC of the core is improved. Through experiments, it is found that compared with the previous strategy, the average performance of big core is improved by more than 3%, and the average performance of little cores is improved by more than 24%.
Tài liệu tham khảo
Lowe-Power J, Ahmad AM, Akram A, Alian M, Amslinger R, Andreozzi M, Armejach A, Asmussen N, Bharadwaj S, Black G, Bloom G, Bruce BR, Carvalho DR, Castrillón J, Chen, L, Derumigny N, Diestelhorst S, Elsasser W, Fariborz M, Farahani AF, Fotouhi P, Gambord R, Gandhi J, Gope D, Grass T, Hanindhito B, Hansson A, Haria S, Harris A, Hayes T, Herrera A, Horsnell M, Jafri SAR, Jagtap R, Jang H, Jeyapaul R, Jones TM, Jung M, Kannoth S, Khaleghzadeh H, Kodama Y, Krishna T, Marinelli T, Menard C, Mondelli A, Mück T, Naji O, Nathella K, Nguyen H, Nikoleris N, Olson LE, Orr MS, Pham B, Prieto P, Reddy T, Roelke A, Samani M, Sandberg A, Setoain J, Shingarov B, Sinclair MD, Ta T, Thakur R, Travaglini G, Upton M, Vaish N, Vougioukas I, Wang Z, Wehn N, Weis C, Wood DA, Yoon H, Zulian ÉF (2020) The gem5 simulator: version 20.0+. CoRR arXIv: abs/2007.03152
Ebrahimi E, Mutlu O, Chang JL, Patt YN ( 2009) Coordinated control of multiple prefetchers in multi-core systems. In: 42st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-42 2009), December 12–16, 2009, New York, New York, USA
Sridharan A, Seznec A, Panda B (2017) Band-pass prefetching: an effective prefetch management mechanism using prefetch-fraction metric in multi-core systems. Acm Trans Archit Code Optim 14:1–27
Kim J, Pugsley SH, Gratz PV, Reddy A, Chishti Z (2016) Path confidence based lookahead prefetching. In: IEEE/ACM International Symposium on Microarchitecture
Bakhshalipour M, Shakerinava M, Lotfi-Kamran P, Sarbazi-Azad H ( 2019) Bingo spatial data prefetcher. In: 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)
Kondguli S, Huang M ( 2017) T2: a highly accurate and energy efficient stride prefetcher. In: IEEE International Conference on Computer Design, pp 373– 376
Kondguli S, Huang M (2018) R3-DLA (reduce, reuse, recycle): A more efficient approach to decoupled look-ahead architectures. In: 2019 IEEE International Symposium On High Performance Computer Architecture (HPCA), pp 533–544. https://doi.org/10.1109/HPCA.2019.00064
Kanghee KIM, Wooseok LEE, Sangbang CHOI (2019) Pmop: efficient per-page most-offset prefetcher. IEICE Trans Inf Syst E102.D(7):1271–1279
Shevgoor M, Koladiya S, Wilkerson C, Chishti Z, Balasubramonian R ( 2017) Efficiently prefetching complex address patterns. In: IEEE/ACM International Symposium on Microarchitecture
Panda B, Balachandran S (2015) Expert prefetch prediction: an expert predicting the usefulness of hardware prefetchers. IEEE Comput Archit Lett 15:13–16
Panda B (2016) Spac: a synergistic prefetcher aggressiveness controller for multi-core systems. IEEE Trans Comput 65:3740–3753
Panda B, Balachandran S (2015) Caffeine: a utility-driven prefetcher aggressiveness engine for multicores. Acm Trans Archit Code Optim 12(3):1–25
Heirman W, Bois KD, Vandriessche Y, Eyerman S, Hur I (2018) Near-side prefetch throttling: adaptive prefetching for high-performance many-core processors. In: Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, PACT 2018, Limassol, Cyprus, November 01–04, 2018, pp 1–11
Lee J, Kim T, Huh J ( 2016) Dynamic prefetcher reconfiguration for diverse memory architectures. In: IEEE International Conference on Computer Design
Lu X, Wang R, Sun XH (2020) Apac: an accurate and adaptive prefetch framework with concurrent memory access analysis. In: 2020 IEEE 38th International Conference on Computer Design (ICCD)
Sun G, Shen J, Veidenbaum AV (2019) Combining prefetch control and cache partitioning to improve multicore performance. In: 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
Holtryd NR, Manivannan M, Stenstrm P, Pericàs M ( 2021) Cbp: coordinated management of cache partitioning, bandwidth partitioning and prefetch throttling
Hiebel J, Brown LE, Wang Z (2019) Machine learning for fine-grained hardware prefetcher control. In: Proceedings of the 48th International Conference on Parallel Processing, pp 1–9
Butko A, Bruguier F, Gamatié A, Sassatelli G, Novo D, Torres L, Robert M (2016) Full-system simulation of big.little multicore architecture for performance and energy exploration. In: 2016 IEEE 10th International Symposium on Embedded Multicore/Many-Core Systems-on-Chip (MCSOC), pp 201–208
Yu T, Petoumenos P, Janjic V, Leather H, Thomson J ( 2020) Colab: a collaborative multi-factor scheduler for asymmetric multicore processors. In: CGO ’20: 18th ACM/IEEE International Symposium on Code Generation and Optimization
Guo H, Quan C, Guo M, Xu L ( 2016) Saws: selective asymmetry-aware work-stealing for asymmetric multi-core architectures. In: 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS)
Adrian GG, Carlos SJ, Manuel PM (2018) Contention-aware fair scheduling for asymmetric single-isa multicore systems. IEEE Trans Comput 67:1703–1719
Kim C, Huh J (2018) Exploring the design space of fair scheduling supports for asymmetric multicore systems. IEEE Trans Comput 67:1136–1152
Lin CC, Li HH, Wu JJ, Liu P (2017) An energy-efficient scheduler for throughput guaranteed jobs on asymmetric multi-core platforms. In: IEEE International Conference on Parallel & Distributed Systems
Wolff W, Porter B (2020) Performance optimization on big.little architectures: A memory-latency aware approach. In: LCTES ’20: 21st ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems
Bringye Z, Sima D, Kozlovszky M ( 2019) Power consumption aware big.little scheduler for linux operating system. In: 2019 IEEE International Work Conference on Bioinspired Intelligence (IWOBI)
Fang J, Yu L, Liu S, Lu J, Chen T (2015) Kl_ga: an application mapping algorithm for mesh-of-tree (mot) architecture in network-on-chip design. J Supercomput 71:4056–4071. https://doi.org/10.1007/s11227-015-1504-y
Fang J, Ma A (2021) Iot application modules placement and dynamic task processing in edge-cloud computing. IEEE Intern Things J 8(16):12771–12781. https://doi.org/10.1109/JIOT.2020.3007751