HPC node performance and energy modeling with the co-location of applications

Springer Science and Business Media LLC - Tập 72 Số 12 - Trang 4771-4809 - 2016

Dauwe, Daniel¹, Jonardi, Eric¹, Friese, Ryan D.¹, Pasricha, Sudeep^1,2, Maciejewski, Anthony A.¹, Bader, David A.³, Siegel, Howard Jay^1,2

¹Department of Electrical and Computer Engineering, Colorado State University, Fort Collins, USA

²Department of Computer Science, Colorado State University, Fort Collins, USA

³College of Computing, Georgia Institute of Technology, Atlanta, USA

Tóm tắt

Multicore processors have become an integral part of modern large-scale and high-performance parallel and distributed computing systems. Unfortunately, applications co-located on multicore processors can suffer from decreased performance and increased dynamic energy use as a result of interference in shared resources, such as memory. As this interference is difficult to characterize, assumptions about application execution time and energy usage can be misleading in the presence of co-location. Consequently, it is important to accurately characterize the performance and energy usage of applications that execute in a co-located manner on these architectures. This work investigates some of the disadvantages of co-location, and presents a methodology for building models capable of utilizing varying amounts of information about a target application and its co-located applications to make predictions about the target application’s execution time and the system’s energy use under arbitrary co-locations of a wide range of application types. The proposed methodology is validated on three different server class Intel Xeon multicore processors using eleven applications from two scientific benchmark suites. The model’s utility for scheduling is also demonstrated in a simulated large-scale high-performance computing environment through the creation of a co-location aware scheduling heuristic. This heuristic demonstrates that scheduling using information generated with the proposed modeling methodology is capable of making significant improvements over a scheduling heuristic that is oblivious to co-location interference.

Tài liệu tham khảo

Verma A, Ahuja P, Neogi A (2008) Power-aware dynamic placement of HPC applications. In: 22nd Annual International Conference on Supercomputing (ICS ’08), pp 175–184 Zhu Q, Zhu J, Agrawal G (2010) Power-aware consolidation of scientific workflows in virtualized environments. In: ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis (SC ’10), pp 1–12 Tang L, Mars J, Vachharajani N, Hundt R, Soffa M (2011) The impact of memory subsystem resource sharing on datacenter applications. In: 38th Annual International Symposium on Computer Architecture (ISCA ’11), pp 283–294 Sandberg A, Sembrant A, Hagersten E, Black-Schaffer D (2013) Modeling performance variation due to cache sharing. In: IEEE 19th International Symposium on High Performance Computer Architecture (HPCA ’13), pp 155–166 Choi J, Dukhan M, Liu X, Vuduc R (2014) Algorithmic time, energy, and power on candidate HPC compute building blocks. In: IEEE 28th International Parallel and Distributed Processing Symposium (IPDPS ’14), pp 447–457 Dauwe D, Friese R, Pasricha S, Maciejewski AA, Koenig GA, Siegel HJ (2014) Modeling the effects on power and performance from memory interference of co-located applications in multicore systems. In: The 2014 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA ’14), pp 3–9 Subramanian L, Seshadri V, Ghosh A, Khan S, Mutlu O (2015) The application slowdown model: quantifying and controlling the impact of inter-application interference at shared caches and main memory. In: 48th International Symposium on Microarchitecture (MICRO-48 ’15), pp 62–75 Merkel A, Stoess J, Bellosa F (2010) Resource-conscious scheduling for energy efficiency on multicore processors. In: 5th European Conference on Computer Systems (EuroSys ’10), pp 153–166 citation_journal_title=IEEE Trans Comput; citation_title=CPU accounting for multicore processors; citation_author=C Luque, M Moreto, FJ Cazorla, R Gioiosa, A Buyuktosunoglu, M Valero; citation_volume=61; citation_issue=2; citation_publication_date=2012; citation_pages=251-264; citation_doi=10.1109/TC.2011.152; citation_id=CR9 Mars J, Tang, L, Hundt R, Skadron K, Soffa M (2011) Bubble-up: increasing utilization in modern warehouse scale computers via sensible co-locations. In: IEEE/ACM 44th International Symposium on Microarchitecture (MICRO ’11), pp 248–259 Dwyer T, Fedorova A, Blagodurov S, Roth M, Gaud F, Pei J (2013) A practical method for estimating performance degradation on multicore processors, and its application to HPC workloads. In: ACM/IEEE International Conference on High Performance Computing, Networking, Storage and Analysis (SC ’12), pp 83:1–83:11 Cazorla FJ, Ramirez A, Valero M, Fernandez E (2004) Dynamically controlled resource allocation in SMT processors. In: 37th International Symposium on Microarchitecture (MICRO-37 ’04), pp 171–182 De Vuyst M, Kumar R, Tullsen DM (2006) Exploiting unbalanced thread scheduling for energy and performance on a CMP of SMT processors. In: IEEE 20th International Parallel and Distributed Processing Symposium (IPDPS ’06), pp 10–20 Feliu J, Sahuquillo J, Petit S, Duato J (2015) Addressing fairness in SMT multicores with a progress-aware scheduler. In: IEEE 29th International Parallel and Distributed Processing Symposium (IPDPS ’15), pp 187–196 citation_journal_title=J Supercomput; citation_title=Deadline and energy constrained dynamic resource allocation in a heterogeneous computing environment; citation_author=BD Young, J Apodaca, LD Briceño, J Smith, S Pasricha, AA Maciejewski, HJ Siegel, B Khemka, S Bahirat, A Ramirez, Y Zou; citation_volume=63; citation_issue=2; citation_publication_date=2013; citation_pages=326-347; citation_doi=10.1007/s11227-012-0740-7; citation_id=CR15 citation_journal_title=IEEE Trans Comput; citation_title=Power and thermal-aware workload allocation in heterogeneous data centers; citation_author=AM Al-Qawasmeh, S Pasricha, AA Maciejewski, HJ Siegel; citation_volume=64; citation_issue=2; citation_publication_date=2015; citation_pages=477-491; citation_doi=10.1109/TC.2013.116; citation_id=CR16 citation_journal_title=Sustain Comput Inf Syst; citation_title=Utility maximizing dynamic resource management in an oversubscribed energy-constrained heterogeneous computing system; citation_author=B Khemka, R Friese, S Pasricha, AA Maciejewski, HJ Siegel, GA Koenig, S Powers, M Hilton, R Rambharos, S Poole; citation_volume=5; citation_publication_date=2015; citation_pages=14-30; citation_id=CR17 Oxley M, Pasricha S, Maciejewski AA, Siegel HJ, Apodaca J, Young D, Briceño L, Smith J, Bahirat S, Khemka B, Ramirez A, Zou Y (2015) Makespan and energy robust stochastic static resource allocation of bags-of-tasks to a heterogeneous computing system. IEEE Trans Parallel Distrib Syst 2791–2805 Talby D, Feitelson DG (1999) Supporting priorities and improving utilization of the IBM SP scheduler using slack-based backfilling. In: 13th International Parallel Processing Symposium (IPPS ’99), pp 513–517 Sadhasivam S, Nagaveni N, Jayarani R, Ram RV (2009) Design and implementation of an efficient two-level scheduler for cloud computing environment. In: International Conference on Advances in Recent Technologies in Communication and Computing (ARTCom ’09), pp 884–886 citation_journal_title=J Supercomput; citation_title=Scheduling parallel jobs on multicore clusters using CPU oversubscription; citation_author=G Utrera, J Corbalan, J Labarta; citation_volume=68; citation_issue=3; citation_publication_date=2014; citation_pages=1113-1140; citation_doi=10.1007/s11227-014-1142-9; citation_id=CR21 Lifka DA (1995) The ANL/IBM SP scheduling system. In: Job scheduling strategies for parallel processing, pp 295–303 citation_title=Principal component analysis; citation_publication_date=2002; citation_id=CR23; citation_author=I Jolliffe; citation_publisher=Wiley citation_title=An introduction to optimization; citation_publication_date=2013; citation_id=CR24; citation_author=EK Chong; citation_author=SH Zak; citation_publisher=Wiley citation_title=“Efficient backprop”, neural networks: tricks of the trade; citation_publication_date=2012; citation_id=CR25; citation_author=YA LeCun; citation_author=L Bottou; citation_author=GB Orr; citation_author=K Müller; citation_publisher=Springer citation_title=Pattern recognition and machine learning; citation_publication_date=2006; citation_id=CR26; citation_author=CM Bishop; citation_publisher=Springer Ubuntu 14 Release Notes. https://wiki.ubuntu.com/TrustyTahr/ReleaseNotes . Accessed Jan 2016 Intel 64 and IA-32 Architectures Software Developer’s Manual Combined Volumes 1,2A,2B,2C,3A,3B,3C and 3D, Technical Report 2015. http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462 . Accessed Jan 2016 Intel Xeon E3-1225v3 Processor http://ark.intel.com/products/75461/ . Accessed Jan 2016 Intel Xeon E5649 Processor http://ark.intel.com/products/52581/ . Accessed Jan 2016 Intel Xeon E5-2697v2 Processor http://ark.intel.com/products/75283/ . Accessed Jan 2016 Performance application programming interface http://icl.cs.utk.edu/papi/ . Accessed Jan 2016 HPCToolkit http://hpctoolkit.org/ . Accessed Jan 2016 Watts Up? Plug Load Meters https://www.wattsupmeters.com/secure/products.php?pn=0 . Accessed Jan 2016 PARSEC Benchmark Suite http://parsec.cs.princeton.edu/ . Accessed Jan 2016 NAS Parallel Benchmarks http://www.nas.nasa.gov/publications/npb.html . Accessed Jan 2016 citation_title=An introduction to the bootstrap; citation_publication_date=1994; citation_id=CR37; citation_author=B Efron; citation_author=RJ Tibshirani; citation_publisher=CRC Press Khemka B, Friese R, Pasricha S, Maciejewski AA, Siegel HJ, Koenig GA, Powers S, Hilton M, Rambharos R, Wright M, Poole S (2015) Comparison of energy-constrained resource allocation heuristics under different task management environments. In: The 2015 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA 2015), pp 3–12 citation_journal_title=IEEE Trans Comput; citation_title=Utility functions and resource management in an oversubscribed heterogeneous computing environment; citation_author=B Khemka, R Friese, LD Briceno, HJ Siegel, AA Maciejewski, GA Koenig, C Groer, G Okonski, MM Hilton, R Rambharos, S Poole; citation_volume=64; citation_issue=8; citation_publication_date=2015; citation_pages=2394-2407; citation_doi=10.1109/TC.2014.2360513; citation_id=CR39 Dauwe D, Jonardi E, Friese R, Pasricha S, Maciejewski AA, Bader DA, Siegel HJ (2015) A methodology for co-location aware application performance modeling in multicore computing. In: 17th Workshop on Advances on Parallel and Distributed Computing Models (APDCM ’15), pp 434–443

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA