Fine-Grained Power Modeling of Multicore Processors Using FFNNs

Mark Sagi1, Nguyen Anh Vu Doan1, Nael Fasfous1, Thomas Wild1, Andreas Herkersdorf1
1Technical University of Munich, Munich, Germany

Tóm tắt

To minimize power consumption while maximizing performance, today’s multicore processors rely on fine-grained run-time dynamic power information—both in the time domain, e.g.  $$\mu $$ s to ms, and space domain, e.g. core-level. The state-of-the-art for deriving such power information is mainly based on predetermined power models which use linear modeling techniques to determine the core-performance/core-power relationship. However, with multicore processors becoming ever more complex, linear modeling techniques cannot capture all possible core-performance related power states anymore. Although artificial neural networks (ANN) have been proposed for coarse-grained power modeling of servers with time resolutions in the range of seconds, few works have yet investigated fine-grained ANN-based power modeling. In this paper, we explore feed-forward neural networks (FFNNs) for core-level power modeling with estimation rates in the range of 10 kHz. To achieve a high estimation accuracy while minimizing run-time overhead, we propose a multi-objective-optimization of the neural architecture using NSGA-II with the FFNNs being trained on performance counter and power data from a complex-out-of-order processor architecture. We show that relative power estimation error for the highest accuracy FFNN decreases on average by 7.5% compared to a state-of-the-art linear power modeling approach and decreases by 5.5% compared to a multivariate polynomial regression model. For the FFNNs optimized for both accuracy and overhead, the average error decreases between 4.1% and 6.7% compared to linear modeling while offering significantly lower overhead compared to the highest accuracy FFNN. Furthermore, we propose a micro-controller-based and an accelerator-based implementation for run-time inference of the power modeling FFNN and show that the area overhead is negligible.

Tài liệu tham khảo

ARM Limited: Cortex-M0 technical reference manual. Technical report (2009) Bertran, R., Gonzelez, M., Martorell, X., Navarro, N., Ayguade, E.: A systematic methodology to generate decomposable and responsiv e power models for CMPs. IEEE Trans. Comput. (2013) Bienia, C.: Benchmarking modern multiprocessors (2011) Bircher, W.L., John, L.K.: Complete system power estimation using processor performance events. IEEE Trans. Comput. (2012) Carlson, T.E., Heirman, W., Eyerman, S., Hur, I., Eeckhout, L.: An evaluation of high-level mechanistic core models. ACM TACO (2014) Chadha, M., Ilsche, T., Bielert, M., Nagel, W.E.: A statistical approach to power estimation for x86 processors. In: Proceedings of the 2017 IEEE 31st international parallel and distributed processing symposium workshops, IPDPSW 2017 (2017) Chen, X., Arbor, A., Dick, R.P., Mao, Z.M.: Performance and power modeling in a multi-programmed multi-core environment pp. 813–818 (2010) Chu, X., Zhang, B., Xu, R.: Multi-objective reinforced evolution in mobile neural architecture search. In: Bartoli, A., Fusiello, A. (eds.) Computer Vision: ECCV 2020 Workshops, pp. 99–113. Springer, Cham (2020) Cupertino, L.F., Da Costa, G., Pierson, J.M.: Towards a generic power estimator. Comput. Sci. Res. Develop. (2014) Deb, K., Agrawal, R.B.: Simulated binary crossover for continuous search space. Technical report (1994) Deb, K., Agrawal, S.: A niched-penalty approach for constraint handling in genetic algorithms. In: Artificial Neural Nets and Genetic Algorithms, pp. 235–243. Springer, Vienna (1999) Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: Nsga-II. IEEE Trans. Evolut. Comput. 6(2), 182–197 (2002). https://doi.org/10.1109/4235.996017 Huang, G.B., Chen, L., Siew, C.K.: Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans. Neural Netw. (2006) Huang, W., Lefurgy, C., Kuk, W., Buyuktosunoglu, A., Floyd, M., Rajamani, K., Allen-Ware, M., Brock, B.: Accurate fine-grained processor power proxies. In: IEEE/ACM MICRO (2012) Kim, Y., Mercati, P., More, A., Shriver, E., Rosing, T.: P4: Phase-based power/performance prediction of heterogeneous systems via neural networks. IEEE/ACM ICCAD (2017) Li, S., Ahn, J.H., Strong, R.D., Brockman, J.B., Tullsen, D.M., Jouppi, N.P.: McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In: IEEE MICRO (2009) Lin, W., Wu, G., Wang, X., Li, K.: An artificial neural network approach to power consumption model construction for servers in cloud data centers. IEEE Trans. Sustain. Comput. (2019) Lu, Z., Whalen, I., Boddeti, V., Dhebar, Y., Deb, K., Goodman, E., Banzhaf, W.: Nsga-net: neural architecture search using multi-objective genetic algorithm. In: Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’19, pp. 419–427. Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3321707.3321729 McCullough, J.C., Agarwal, Y., Chandrashekar, J., Kuppuswamy, S., Snoeren, A.C., Gupta, R.K., Diego, U.C.S., Labs, I.: Evaluating the effectiveness of model-based power characterization. Usenix Atc (2011) Möbius, C., Dargie, W., Schill, A.: Power consumption estimation models for processors, virtual machines, and servers. IEEE TPDS (2014) Pathania, A., Henkel, J.: HotSniper: Sniper-based toolchain for many-core thermal simulations in open systems. IEEE Embedd. Syst. Lett. (2019) Rapp, M., Pathania, A., Mitra, T., Henkel, J.: Prediction-based task migration on S-NUCA many-cores. In: DATE (2019) Rapp, M., Sagi, M., Pathania, A., Herkersdorf, A., Henkel, J.: Power- and cache-aware task mapping with dynamic power budgeting for many-cores. IEEE Trans. Comput. (2019) Rapp, M., Sagi, M., Pathania, A., Herkersdorf, A., Henkel, J.: Power- and cache-aware task mapping with dynamic power budgeting for many-cores. IEEE Trans. Comput. 69(1), 1–13 (2020). https://doi.org/10.1109/TC.2019.2935446 Rethinagiri, S.K., Palomar, O., Ben Atitallah, R., Niar, S., Unsal, O., Kestelman, A.C.: System-level power estimation tool for embedded processor based platforms. In: ACM RAPIDO (2014) Sagi, M., Vu Doan, N.A., Fasfous, N., Wild, T., Herkersdorf, A.: Fine-grained power modeling of multicore processors using ffnns. In: Orailoglu, A., Jung, M., Reichenbach, M. (eds.) Embedded Computer Systems: Architectures, Modeling, and Simulation, pp. 186–199. Springer, Cham (2020) Samei, Y., Dömer, R.: Automated estimation of power consumption for rapid system level design. In: IEEE IPCCC (2014) Shahid, A., Fahad, M., Manumachu, R.R., Lastovetsky, A.: Improving the accuracy of energy predictive models for multicore cpus using additivity of performance monitoring counters. In: Parallel Computing Technologies (2019) Su, B., Gu, J., Shen, L., Huang, W., Greathouse, J.L., Wang, Z.: Ppep: online performance, power, and energy prediction framework and dvfs space exploration. In: IEEE/ACM MICRO (2014) Umuroglu, Y., Fraser, N.J., Gambardella, G., Blott, M., Leong, P., Jahre, M., Vissers, K.: Finn: A framework for fast, scalable binarized neural network inference. In: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA ’17, pp. 65–74. ACM (2017) Vidnerová, P., Neruda, R.: Multi-objective evolution for deep neural network architecture search. In: Yang, H., Pasupa, K., Leung, A.C.S., Kwok, J.T., Chan, J.H., King, I. (eds.) Neural Information Processing, pp. 270–281. Springer, Cham (2020) Walker, M.J., Diestelhorst, S., Hansson, A., Das, A.K., Yang, S., Al-Hashimi, B.M., Merrett, G.V.: Accurate and stable run-time power modeling for mobile and embedded CPUs. IEEE TCAD (2017) Woof, S.C., Ohara, M., Torriet, E.: The Splash-2 programs: characterization and methodological considerations. In: ACM ISCA (1995) Wu, W., Lin, W., He, L., Wu, G., Hsu, C.H.: A power consumption model for cloud servers based on elman neural network. IEEE Trans. Cloud Comput. (2019)