Amdahl's law in the context of heterogeneous many‐core systems – a survey

IET Computers and Digital Techniques - Tập 14 Số 4 - Trang 133-148 - 2020

Mohammed A. Noaman Al‐hayanni^1,2, Fei Xia², Ashur Rafiev³, Alexander Romanovsky³, Rishad Shafik², Alex Yakovlev²

¹School of Electrical Engineering, University of Technology, P.O. Box(19006) Baghdad, Iraq

²School of Engineering, Newcastle University, Newcastle upon Tyne NE1 7RU, UK

³School of Computing, Newcastle University, Newcastle upon Tyne NE4 5TG, UK

Tóm tắt

Từ khóa

Tài liệu tham khảo

10.1109/N-SSC.2007.4785615

Sun X.H. Ni L.M.: ‘Another view on parallel speedup’.Proc. Super Computing ‘90 New York NY USA November1990 pp.324–333

Al‐hayanni M.A.N. Rafiev A. Shafik R. et al.: ‘Power and energy normalized speedup models for heterogeneous many core computing’.2016 16th Int. Conf. on Application of Concurrency to System Design ACSD Torun Poland June2016 pp.84–93

10.1109/MC.2008.209

Rafiev A., 2018, Speedup and power scaling models for heterogeneous many‐core systems, IEEE Trans. Multi‐Scale Comput. Syst., 4, 436, 10.1109/TMSCS.2018.2791531

Rabaey J., 2012, Low power design methodologies

Xia F., 2017, Voltage, throughput, power, reliability, and multicore scaling, Computer, 50, 34, 10.1109/MC.2017.3001246

10.1504/IJCAET.2014.065419

Eyerman S., 2011, Fine‐grained DVFS using on‐chip regulators, ACM Trans. Archit. Code Optim., 8, 1:1, 10.1145/1952998.1952999

Moore G.E., 2006, Cramming more components onto integrated circuits, reprinted from electronics, volume 38, number 8, April 19, 1965, pp.114 ff, IEEE Solid‐State Circuits Soc. Newsl., 11, 33, 10.1109/N-SSC.2006.4785860

10.1109/MAHC.2010.28

Borkar S.: ‘Thousand core chips: a technology perspective’.Proc. of the 44th annual Design Automation Conf. San Diego CA USA June2007 pp.746–749

Downey A.B.: ‘A model for speedup of parallel programs’.Tech. Rep. UC Berkeley Berkeley CA USA 1997

Sridharan S. Gupta G. Sohi G.S.: ‘Adaptive efficient parallel execution of parallel programs’.Proc. of the 35th ACM SIGPLAN Conf. on Programming Language Design and Implementation ser. PLDI'14 Edinburgh UK 2014 pp.169–180. Available athttp://doi.acm.org/10.1145/2594291.2594292

Al‐hayanni M.A.N. Shafik R. Rafiev A. et al.: ‘Speedup and parallelization models for energy‐efficient many‐core systems using performance counters’.2017 Int. Conf. on High Performance Computing Simulation (HPCS) Genova Italy July2017 pp.410–417

Intel Intel 64 and IA‐32 Architectures Software Developer's Manual. Volume 3B: System Programming Guide Part 2. Intel September2016. Available athttp://www.intel.co.uk/content/www/uk/en/architecture‐and‐technology/64‐ia‐32‐architectures‐software‐developer‐vol‐3b‐part‐2‐manual.html

Morad T.Y., 2006, Performance, power efficiency and scalability of asymmetric cluster chip multiprocessors, IEEE Comput. Archit. Lett., 5, 4, 10.1109/L-CA.2006.6

Sato T. Mori H. Yano R. et al.: ‘Importance of single‐core performance in the multicore era’.Proc. of the Thirty‐fifth Australasian Computer Science Conf. ‐ Volume 122 ser. ACSC ‘12 Darlinghurst NSW Australia 2012 pp.107–114. Available athttp://dl.acm.org/citation.cfm?id=2483654.2483667

Zidenberg T., 2012, Multi‐Amdahl: how should I divide my heterogeneous chip?, IEEE Comput. Archit. Lett., 11, 65, 10.1109/L-CA.2012.3

Morad A., 2014, Generalized multi‐Amdahl: optimization of heterogeneous multi‐accelerator SoC, IEEE Comput. Archit. Lett., 13, 37, 10.1109/L-CA.2012.34

Intel 2017. Available athttps://www.intel.co.uk/content/www/uk/en/homepage.html

10.1145/42411.42415

Mercelis S.: ‘A systematic multi‐layered approach for optimizing and parallelizing real‐time media and audio applications’.Ph.D. dissertation University of Antwerp 2016

Yun Y., 2019, Estimation of maximum speed‐up in multicore‐based mobile devices, IEEE Embedded Syst. Lett., 11, 62, 10.1109/LES.2018.2873018

Woo D.H., 2008, Extending Amdahl's law for energy‐efficient computing in the many‐core era, Computer, 41, 24, 10.1109/MC.2008.494

Cameron K.W., 2012, Generalizing Amdahl's law for power and energy, Computer, 45, 75, 10.1109/MC.2012.92

10.1109/L-CA.2007.18

Marowka A.: ‘Extending Amdahl's law for heterogeneous computing’.2012 IEEE 10th Int. Symp. on Parallel and Distributed Processing with Applications Madrid Spain July2012 pp.309–316

Issa J. Figueira S.: ‘Performance and power‐consumption analysis of mobile internet devices’.30th IEEE Int. Performance Computing and Communications Conf. London UK November2011 pp.1–6

Gupta U., 2016, A generic energy optimization framework for heterogeneous platforms using scaling models, Microprocess. Microsyst., 40, 74, 10.1016/j.micpro.2015.06.009

Wilson L.F. Shen W.: ‘Experiments in load migration and dynamic load balancing in SPEEDES’.1998 Winter Simulation Conf. Proc. (Cat. No.98CH36274) Washington DC USA December1998 vol. 1 pp.483–490

Ryou J.C. Wong J.S.K.: ‘A task migration algorithm for load balancing in a distributed system’.Proc. of the Twenty‐Second Annual Hawaii Int. Conf. on System Sciences: Software Track Kailua‐Kona HI USA January1989 vol. 2 pp.1041–1048

Johari S. Kumar A.: ‘Algorithmic approach for applying load balancing during task migration in multi‐core system’.2014 Int. Conf. on Parallel Distributed and Grid Computing Solan Himachal Pradesh India December2014 pp.27–32

Li Y. Niu J. Long X. et al.: ‘Energy efficient scheduling with probability and task migration considerations for soft real‐time systems’.2014 IEEE Computers Communications and IT Applications Conf. Beijing People's Republic of Chin October2014 pp.287–293

Cormen T.H., 2009, Introduction to algorithm

Agrawal K. He Y. Hsu W. et al.: ‘Adaptive scheduling with parallelism feedback’.2007 IEEE Int. Parallel and Distributed Processing Symp. Rome Italy March2007 pp.1–7

Cassidy A.S., 2012, Beyond Amdahl's law: an objective function that links multiprocessor performance gains to delay and energy, IEEE Trans. Comput., 61, 1110, 10.1109/TC.2011.169

Rafiev A. Al‐hayanni M.A.N. Xia F. et al.: ‘Extending multi‐fraction speedup models to normal form heterogeneity’.Tech. Rep. NCL‐EEE‐MICRO‐TR‐2018‐202 μSystems Research Group School of Engineering Newcastle University 2018

Londono S.M. deGyvez J.P.: ‘Extending Amdahl's law for energy‐efficiency’.2010 Int. Conf. on Energy Aware Computing Cairo Egypt December2010 pp.1–4

Ofenbeck G. Steinmann R. Cabezas V.C. et al.: ‘Applying the roofline model’.IEEE Int. Symp. on Performance Analysis of Systems and Software (ISPASS) Monterey CA USA 2014 pp.76–85

Gupta U. Campbell J. Ogras U.Y. et al.: ‘Adaptive performance prediction for integrated GPUs’.Proc. of the 35th Int. Conf. on Computer‐Aided Design ser. ICCAD ‘16 Austin TX USA 2016 pp.61:1–61:8. Available athttp://doi.acm.org/10.1145/2966986.2966997

10.1006/jpdc.1993.1087

Yao E., 2011, What Hill–Marty model learn from and break through Amdahl's law?, Inf. Process. Lett., 111, 1092, 10.1016/j.ipl.2011.09.009

Zidenberg T., 2013, Optimal resource allocation with multi‐Amdahl, Computer, 46, 70, 10.1109/MC.2012.359

Rogers B.M., 2009, Scaling the bandwidth wall: challenges in and avenues for CMP scaling, SIGARCH Comput. Archit. News, 37, 371, 10.1145/1555815.1555801

Chen X. Lu Z. Jantsch A. et al.: ‘Speedup analysis of data‐parallel applications on multi‐core NoCs’.IEEE 8th Int. Conf. on ASIC Changsha People's Republic of China October2009 pp.105–108

Kumar R. Zyuban V. Tullsen D.M.: ‘Interconnections in multi‐core architectures: understanding mechanisms overheads and scaling’.32nd Int. Symp. on Computer Architecture (ISCA'05) Madison WI USA June2005 pp.408–419

Rodrigues E.R. Madruga F.L. Navaux P.O.A. et al.: ‘Multi‐core aware process mapping and its impact on communication overhead of parallel applications’.2009 IEEE Symp. on Computers and Communications Sousse Tunisia July2009 pp.811–817

Ahmad T.B. Ciesielski M.: ‘An approach to multi‐core functional gate‐level simulation minimizing synchronization and communication overheads’.2013 14th Int. Workshop on Microprocessor Test and Verification Austin TX USA December2013 pp.77–82

Li X. Malek M.: ‘Analysis of speedup and communication/computation ratio in multiprocessor systems’.Proc. Real‐Time Systems Symp. Huntsville AL USA December1988 pp.282–288

Pei S., 2016, Extending Amdahl's law for heterogeneous multicore processor with consideration of the overhead of data preparation, IEEE Embedded Sys. Lett., 8, 26, 10.1109/LES.2016.2519521

Pei S., 2016, Reevaluating the overhead of data preparation for asymmetric multicore system on graphics processing, KSII Trans. Internet Inf. Syst., 10, 3231

Sun X.‐H. Chen Y. Byna S.: ‘Scalable computing in the multicore era’.Proc. of the Int. Symp. on Parallel Architectures Algorithms and Programming Sydney NSW Australia 2008

10.1016/j.jpdc.2009.05.002

10.1145/1816038.1816011

Moncrieff D., 1996, Heterogeneous computing machines and Amdahl's law, Parallel Comput., 22, 407, 10.1016/0167-8191(95)00071-2

Yao E., 2009, Extending Amdahl's law in the multicore era, SIGMETRICS Perform. Eval. Rev., 37, 24, 10.1145/1639562.1639571

Juurlink B., 2012, Amdahl's law for predicting the future of multicores considered harmful, SIGARCH Comput. Archit. News, 40, 1, 10.1145/2234336.2234338

Ye N. Hao Z. Xie X.: ‘The speedup model for many core processor’.2013 Int. Conf. on Information Science and Cloud Computing Companion Guangzhou People's Republic of China December2013 pp.469–474

Blem E., 2013, Multicore model from abstract single core inputs, IEEE Comput. Archit. Lett., 12, 59, 10.1109/L-CA.2012.27

Loh G.H.: ‘The cost of uncore in throughput‐oriented many‐core processors’.Proc. of Workshop on Architectures and Languages for Throughput Applications (ALTA) Beijing People's Republic of China 2008 pp.1–9

Khanyile N.P., 2012, An analytic model for predicting the performance of distributed applications on multicore clusters, IAENG Int. J. Comp. Sci., 39, 312

Khanyile N.P. Tapamo J.‐R. Dube E.: ‘Performance prediction model for distributed applications on multicore clusters’ Proceedings of the World Congress on Engineering London U.K July 4–6 2012 Vol II WCE 2012

Huang T., 2013, Extending Amdahl's law and Gustafson's law by evaluating interconnections on multi‐core processors, J. Supercomput., 66, 305, 10.1007/s11227-013-0908-9

Chung E.S. Milder P.A. Hoe J.C. et al.: ‘Single‐chip heterogeneous computing: does the future include custom logic FPGAs and GPGPUs?’.2010 43rd Annual IEEE/ACM Int. Symp. on Microarchitecture Atlanta GA USA December2010 pp.225–236

Che H., 2014, Amdahl's law for multithreaded multicore processors, J. Parallel Distrib. Comput., 74, 3056, 10.1016/j.jpdc.2014.06.012

Tang S. Lee B.S. He B.: ‘Speedup for multi‐level parallel computing’.2012 IEEE 26th Int. Parallel and Distributed Processing Symp. Workshops PhD Forum Shanghai People's Republic of China May2012 pp.537–546

Lee S. Kim S.H. Ro W.W.: ‘Multicore speedup models using frequency scaling with fixed power budget’.2014 Int. Conf. on Electronics Information and Communications (ICEIC) Kota Kinabalu Malaysia January2014 pp.1–2

Pusukuri K.K. Gupta R. Bhuyan L.N.: ‘Thread reinforcer: dynamically determining number of threads via OS level monitoring’.2011 IEEE Int. Symp. on Workload Characterization (IISWC) Austin TX USA November2011 pp.116–125

Sasaki H. Imamura S. Inoue K.: ‘Coordinated power‐performance optimization in many cores’.Proc. of the 22nd Int. Conf. on Parallel Architectures and Compilation Techniques Edinburgh UK September2013 pp.51–61

Al‐Babtain B.M., 2013, A survey on Amdahl's law extension in multicore architectures, Int. J. New Comput. Archit. Their Appl., 3, 30

Ayoub R. Ogras U. Gorbatov E. et al.: ‘OS‐level power minimization under tight performance constraints in general purpose systems’.IEEE/ACM Int. Symp. on Low Power Electronics and Design Fukuoka Japan August2011 pp.321–326

Casey S.D.: ‘How to determine the effectiveness of hyper‐threading technology with an application’ 2011. Available athttps://software.intel.com/en‐us/articles/how‐to‐determinethe‐effectiveness‐of‐hyper‐threading‐technology‐with‐an‐application

10.1145/216585.216588

McKee S.A., 2011, Memory wall

Zurawski R., 2009, Embedded systems handbook, second edition: embedded systems design and verification, 10.1201/9781439807637

Quinn M.J., 2003, Parallel programming in C with MPI and OpenMP

Vipin K., 2018, FPGA dynamic and partial reconfiguration: a survey of architectures, methods, and applications, ACM Comput. Surv., 51, 72:1, 10.1145/3193827

Vajda A., 2011, Multi‐core and many‐core processor architectures, 10.1007/978-1-4419-9739-5_2

Kumar R. Tullsen D.M. Ranganathan P. et al.: ‘Single‐ISA heterogeneous multi‐core architectures for multithreaded workload performance’.Proc. 31st Annual Int. Symp. on Computer Architecture 2004 München Germany June2004 pp.64–75

Arenas M.G., 2011, GPU computation in bioinspired algorithms: a review, 433

Wittenbrink C.M., 2011, FERMI GF100 GPU architecture, IEEE Micro, 31, 50, 10.1109/MM.2011.24

Lee V.W. Grochowski E. Geva R.: ‘Performance benefits of heterogeneous computing in HPC workloads’.2012 IEEE 26th Int. Parallel and Distributed Processing Symp. Workshops PhD Forum Shanghai People's Republic of China May2012 pp.16–26

Kumar R. Farkas K.I. Jouppi N.P. et al.: ‘Single‐ISA heterogeneous multi‐core architectures: the potential for processor power reduction’.Proc. 36th Annual IEEE/ACM Int. Symp. on Microarchitecture 2003. MICRO‐36 San Diego CA USA December2003 pp.81–92

Greenhalgh P.: ‘White paper: big.LITTLE processing with ARM Cortex‐A15 and Cortex‐A7 – improving energy efficiency in high‐performance mobile platforms’ ARM 2011. Available athttps://www.cl.cam.ac.uk/rdm34/big.LITTLE.pdf

Venkat A. Tullsen D.M.: ‘Harnessing ISA diversity: design of a heterogeneous‐ISA chip multiprocessor’.2014 ACM/IEEE 41st Int. Symp. on Computer Architecture (ISCA) Minneapolis MN USA June2014 pp.121–132

Power J. Basu A. Gu J. et al.: ‘Heterogeneous system coherence for integrated CPU‐GPU systems’.2013 46th Annual IEEE/ACM Int. Symp. on Microarchitecture (MICRO) Davis CA USA December2013 pp.457–467

Pathania A. Jiao Q. Prakash A. et al.: ‘Integrated CPU‐GPU power management for 3D mobile games’.Proc. of the 51st Annual Design Automation Conf. ser. DAC ‘14 San Francisco CA USA 2014 pp.40:1–40:6. Available athttp://doi.acm.org/10.1145/2593069.2593151

Mujtaba H.: ‘Intel Skylake GPU architecture analysis’ 2015. Available athttps://wccftech.com/idf15‐intel‐skylake‐analysis‐cpu‐gpumicroarchitecture‐ddr4‐memory‐impact/3/

10.1145/1816038.1816021

Palacios J. Triska J.: ‘A comparison of modern GPU and CPU architectures: and the common convergence of both’ Oregon State University 2011. Available athttps://hgpu.org/?p=6610

Cullinan C. Wyant C. Frattesi T. et al.: ‘Computing performance benchmarks among CPU GPU and FPGA’ Worcester Polytechnic Institute 2012. Available athttps://web.wpi.edu/Pubs/E‐project/Available/Eproject‐030212‐123508/unrestricted/Benchmarking_Final.pdf

Duda K.J., 1999, Borrowed‐virtual‐time BVT scheduling: supporting latency‐sensitive threads in a general‐purpose scheduler, SIGOPS Oper. Syst. Rev., 33, 261, 10.1145/319344.319169

Agarwal A., 1995, The MIT alewife machine: architecture and performance, SIGARCH Comput. Archit. News, 23, 2, 10.1145/225830.223985

Boothe B., 1992, Improved multithreading techniques for hiding communication latency in multiprocessors, SIGARCH Comput. Archit. News, 20, 214, 10.1145/146628.139729

10.1145/2788396

The international technology roadmap for semiconductors ITRS 2017. Available athttp://www.itrs2.net/

Leiserson C.: ‘What the $#@¡ is parallelism anyhow?’ 2017. Available athttps://www.cprogramming.com/parallelism.html

Han S. Yun Y. Kim Y.H.: ‘Profiling‐based task graph extraction on multiprocessor system‐on‐chip’.2016 IEEE Asia Pacific Conf. on Circuits and Systems (APCCAS) Jeju Republic of Korea October2016 pp.510–513

10.1145/151244.151246

Deng X. Dymond P.: ‘On multiprocessor system scheduling’.Proc. of the Eighth Annual ACM Symp. on Parallel Algorithms and Architectures ser. SPAA ‘96 Padua Italy 1996 pp.82–88. Available athttp://doi.acm.org/10.1145/237502.237510

10.1002/cpe.585

Gupta U., 2015, Constrained energy optimization in heterogeneous platforms using generalized scaling models, IEEE Comput. Archit. Lett., 14, 21, 10.1109/LCA.2014.2326603

Kim W. Gupta M.S. Wei G.Y. et al.: ‘System level analysis of fast per‐core DVFS using on‐chip switching regulators’.2008 IEEE 14th Int. Symp. on High Performance Computer Architecture Salt Lake City UT USA February2008 pp.123–134

Aalsaud A. Rafiev A. Xia F. et al.: ‘Model‐free runtime management of concurrent workloads for energy‐efficient many‐core heterogeneous systems’.28th Int. Symp. on Power and Timing Modeling Optimization and Simulation (PATMOS) Platja d'Aro Spain 2018 pp.206–213

10.1145/1577129.1577137

Li Y. Niu J. Zhang J. et al.: ‘An optimized RM algorithm by task affinity on multi‐core processor’.2016 IEEE 22nd Int. Conf. on Parallel and Distributed Systems (ICPADS) Wuhan Hubei People's Republic of China December2016 pp.286–293

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA