Amdahl's law in the context of heterogeneous many‐core systems – a survey
Tóm tắt
Từ khóa
Tài liệu tham khảo
Sun X.H. Ni L.M.: ‘Another view on parallel speedup’.Proc. Super Computing ‘90 New York NY USA November1990 pp.324–333
Al‐hayanni M.A.N. Rafiev A. Shafik R. et al.: ‘Power and energy normalized speedup models for heterogeneous many core computing’.2016 16th Int. Conf. on Application of Concurrency to System Design ACSD Torun Poland June2016 pp.84–93
Rafiev A., 2018, Speedup and power scaling models for heterogeneous many‐core systems, IEEE Trans. Multi‐Scale Comput. Syst., 4, 436, 10.1109/TMSCS.2018.2791531
Rabaey J., 2012, Low power design methodologies
Xia F., 2017, Voltage, throughput, power, reliability, and multicore scaling, Computer, 50, 34, 10.1109/MC.2017.3001246
Eyerman S., 2011, Fine‐grained DVFS using on‐chip regulators, ACM Trans. Archit. Code Optim., 8, 1:1, 10.1145/1952998.1952999
Moore G.E., 2006, Cramming more components onto integrated circuits, reprinted from electronics, volume 38, number 8, April 19, 1965, pp.114 ff, IEEE Solid‐State Circuits Soc. Newsl., 11, 33, 10.1109/N-SSC.2006.4785860
Borkar S.: ‘Thousand core chips: a technology perspective’.Proc. of the 44th annual Design Automation Conf. San Diego CA USA June2007 pp.746–749
Sridharan S. Gupta G. Sohi G.S.: ‘Adaptive efficient parallel execution of parallel programs’.Proc. of the 35th ACM SIGPLAN Conf. on Programming Language Design and Implementation ser. PLDI'14 Edinburgh UK 2014 pp.169–180. Available athttp://doi.acm.org/10.1145/2594291.2594292
Al‐hayanni M.A.N. Shafik R. Rafiev A. et al.: ‘Speedup and parallelization models for energy‐efficient many‐core systems using performance counters’.2017 Int. Conf. on High Performance Computing Simulation (HPCS) Genova Italy July2017 pp.410–417
Intel Intel 64 and IA‐32 Architectures Software Developer's Manual. Volume 3B: System Programming Guide Part 2. Intel September2016. Available athttp://www.intel.co.uk/content/www/uk/en/architecture‐and‐technology/64‐ia‐32‐architectures‐software‐developer‐vol‐3b‐part‐2‐manual.html
Morad T.Y., 2006, Performance, power efficiency and scalability of asymmetric cluster chip multiprocessors, IEEE Comput. Archit. Lett., 5, 4, 10.1109/L-CA.2006.6
Sato T. Mori H. Yano R. et al.: ‘Importance of single‐core performance in the multicore era’.Proc. of the Thirty‐fifth Australasian Computer Science Conf. ‐ Volume 122 ser. ACSC ‘12 Darlinghurst NSW Australia 2012 pp.107–114. Available athttp://dl.acm.org/citation.cfm?id=2483654.2483667
Zidenberg T., 2012, Multi‐Amdahl: how should I divide my heterogeneous chip?, IEEE Comput. Archit. Lett., 11, 65, 10.1109/L-CA.2012.3
Morad A., 2014, Generalized multi‐Amdahl: optimization of heterogeneous multi‐accelerator SoC, IEEE Comput. Archit. Lett., 13, 37, 10.1109/L-CA.2012.34
Intel 2017. Available athttps://www.intel.co.uk/content/www/uk/en/homepage.html
Mercelis S.: ‘A systematic multi‐layered approach for optimizing and parallelizing real‐time media and audio applications’.Ph.D. dissertation University of Antwerp 2016
Yun Y., 2019, Estimation of maximum speed‐up in multicore‐based mobile devices, IEEE Embedded Syst. Lett., 11, 62, 10.1109/LES.2018.2873018
Woo D.H., 2008, Extending Amdahl's law for energy‐efficient computing in the many‐core era, Computer, 41, 24, 10.1109/MC.2008.494
Cameron K.W., 2012, Generalizing Amdahl's law for power and energy, Computer, 45, 75, 10.1109/MC.2012.92
Marowka A.: ‘Extending Amdahl's law for heterogeneous computing’.2012 IEEE 10th Int. Symp. on Parallel and Distributed Processing with Applications Madrid Spain July2012 pp.309–316
Issa J. Figueira S.: ‘Performance and power‐consumption analysis of mobile internet devices’.30th IEEE Int. Performance Computing and Communications Conf. London UK November2011 pp.1–6
Gupta U., 2016, A generic energy optimization framework for heterogeneous platforms using scaling models, Microprocess. Microsyst., 40, 74, 10.1016/j.micpro.2015.06.009
Wilson L.F. Shen W.: ‘Experiments in load migration and dynamic load balancing in SPEEDES’.1998 Winter Simulation Conf. Proc. (Cat. No.98CH36274) Washington DC USA December1998 vol. 1 pp.483–490
Ryou J.C. Wong J.S.K.: ‘A task migration algorithm for load balancing in a distributed system’.Proc. of the Twenty‐Second Annual Hawaii Int. Conf. on System Sciences: Software Track Kailua‐Kona HI USA January1989 vol. 2 pp.1041–1048
Johari S. Kumar A.: ‘Algorithmic approach for applying load balancing during task migration in multi‐core system’.2014 Int. Conf. on Parallel Distributed and Grid Computing Solan Himachal Pradesh India December2014 pp.27–32
Li Y. Niu J. Long X. et al.: ‘Energy efficient scheduling with probability and task migration considerations for soft real‐time systems’.2014 IEEE Computers Communications and IT Applications Conf. Beijing People's Republic of Chin October2014 pp.287–293
Cormen T.H., 2009, Introduction to algorithm
Agrawal K. He Y. Hsu W. et al.: ‘Adaptive scheduling with parallelism feedback’.2007 IEEE Int. Parallel and Distributed Processing Symp. Rome Italy March2007 pp.1–7
Cassidy A.S., 2012, Beyond Amdahl's law: an objective function that links multiprocessor performance gains to delay and energy, IEEE Trans. Comput., 61, 1110, 10.1109/TC.2011.169
Rafiev A. Al‐hayanni M.A.N. Xia F. et al.: ‘Extending multi‐fraction speedup models to normal form heterogeneity’.Tech. Rep. NCL‐EEE‐MICRO‐TR‐2018‐202 μSystems Research Group School of Engineering Newcastle University 2018
Londono S.M. deGyvez J.P.: ‘Extending Amdahl's law for energy‐efficiency’.2010 Int. Conf. on Energy Aware Computing Cairo Egypt December2010 pp.1–4
Ofenbeck G. Steinmann R. Cabezas V.C. et al.: ‘Applying the roofline model’.IEEE Int. Symp. on Performance Analysis of Systems and Software (ISPASS) Monterey CA USA 2014 pp.76–85
Gupta U. Campbell J. Ogras U.Y. et al.: ‘Adaptive performance prediction for integrated GPUs’.Proc. of the 35th Int. Conf. on Computer‐Aided Design ser. ICCAD ‘16 Austin TX USA 2016 pp.61:1–61:8. Available athttp://doi.acm.org/10.1145/2966986.2966997
Yao E., 2011, What Hill–Marty model learn from and break through Amdahl's law?, Inf. Process. Lett., 111, 1092, 10.1016/j.ipl.2011.09.009
Zidenberg T., 2013, Optimal resource allocation with multi‐Amdahl, Computer, 46, 70, 10.1109/MC.2012.359
Rogers B.M., 2009, Scaling the bandwidth wall: challenges in and avenues for CMP scaling, SIGARCH Comput. Archit. News, 37, 371, 10.1145/1555815.1555801
Chen X. Lu Z. Jantsch A. et al.: ‘Speedup analysis of data‐parallel applications on multi‐core NoCs’.IEEE 8th Int. Conf. on ASIC Changsha People's Republic of China October2009 pp.105–108
Kumar R. Zyuban V. Tullsen D.M.: ‘Interconnections in multi‐core architectures: understanding mechanisms overheads and scaling’.32nd Int. Symp. on Computer Architecture (ISCA'05) Madison WI USA June2005 pp.408–419
Rodrigues E.R. Madruga F.L. Navaux P.O.A. et al.: ‘Multi‐core aware process mapping and its impact on communication overhead of parallel applications’.2009 IEEE Symp. on Computers and Communications Sousse Tunisia July2009 pp.811–817
Ahmad T.B. Ciesielski M.: ‘An approach to multi‐core functional gate‐level simulation minimizing synchronization and communication overheads’.2013 14th Int. Workshop on Microprocessor Test and Verification Austin TX USA December2013 pp.77–82
Li X. Malek M.: ‘Analysis of speedup and communication/computation ratio in multiprocessor systems’.Proc. Real‐Time Systems Symp. Huntsville AL USA December1988 pp.282–288
Pei S., 2016, Extending Amdahl's law for heterogeneous multicore processor with consideration of the overhead of data preparation, IEEE Embedded Sys. Lett., 8, 26, 10.1109/LES.2016.2519521
Pei S., 2016, Reevaluating the overhead of data preparation for asymmetric multicore system on graphics processing, KSII Trans. Internet Inf. Syst., 10, 3231
Sun X.‐H. Chen Y. Byna S.: ‘Scalable computing in the multicore era’.Proc. of the Int. Symp. on Parallel Architectures Algorithms and Programming Sydney NSW Australia 2008
Moncrieff D., 1996, Heterogeneous computing machines and Amdahl's law, Parallel Comput., 22, 407, 10.1016/0167-8191(95)00071-2
Yao E., 2009, Extending Amdahl's law in the multicore era, SIGMETRICS Perform. Eval. Rev., 37, 24, 10.1145/1639562.1639571
Juurlink B., 2012, Amdahl's law for predicting the future of multicores considered harmful, SIGARCH Comput. Archit. News, 40, 1, 10.1145/2234336.2234338
Ye N. Hao Z. Xie X.: ‘The speedup model for many core processor’.2013 Int. Conf. on Information Science and Cloud Computing Companion Guangzhou People's Republic of China December2013 pp.469–474
Blem E., 2013, Multicore model from abstract single core inputs, IEEE Comput. Archit. Lett., 12, 59, 10.1109/L-CA.2012.27
Loh G.H.: ‘The cost of uncore in throughput‐oriented many‐core processors’.Proc. of Workshop on Architectures and Languages for Throughput Applications (ALTA) Beijing People's Republic of China 2008 pp.1–9
Khanyile N.P., 2012, An analytic model for predicting the performance of distributed applications on multicore clusters, IAENG Int. J. Comp. Sci., 39, 312
Khanyile N.P. Tapamo J.‐R. Dube E.: ‘Performance prediction model for distributed applications on multicore clusters’ Proceedings of the World Congress on Engineering London U.K July 4–6 2012 Vol II WCE 2012
Huang T., 2013, Extending Amdahl's law and Gustafson's law by evaluating interconnections on multi‐core processors, J. Supercomput., 66, 305, 10.1007/s11227-013-0908-9
Chung E.S. Milder P.A. Hoe J.C. et al.: ‘Single‐chip heterogeneous computing: does the future include custom logic FPGAs and GPGPUs?’.2010 43rd Annual IEEE/ACM Int. Symp. on Microarchitecture Atlanta GA USA December2010 pp.225–236
Che H., 2014, Amdahl's law for multithreaded multicore processors, J. Parallel Distrib. Comput., 74, 3056, 10.1016/j.jpdc.2014.06.012
Tang S. Lee B.S. He B.: ‘Speedup for multi‐level parallel computing’.2012 IEEE 26th Int. Parallel and Distributed Processing Symp. Workshops PhD Forum Shanghai People's Republic of China May2012 pp.537–546
Lee S. Kim S.H. Ro W.W.: ‘Multicore speedup models using frequency scaling with fixed power budget’.2014 Int. Conf. on Electronics Information and Communications (ICEIC) Kota Kinabalu Malaysia January2014 pp.1–2
Pusukuri K.K. Gupta R. Bhuyan L.N.: ‘Thread reinforcer: dynamically determining number of threads via OS level monitoring’.2011 IEEE Int. Symp. on Workload Characterization (IISWC) Austin TX USA November2011 pp.116–125
Sasaki H. Imamura S. Inoue K.: ‘Coordinated power‐performance optimization in many cores’.Proc. of the 22nd Int. Conf. on Parallel Architectures and Compilation Techniques Edinburgh UK September2013 pp.51–61
Al‐Babtain B.M., 2013, A survey on Amdahl's law extension in multicore architectures, Int. J. New Comput. Archit. Their Appl., 3, 30
Ayoub R. Ogras U. Gorbatov E. et al.: ‘OS‐level power minimization under tight performance constraints in general purpose systems’.IEEE/ACM Int. Symp. on Low Power Electronics and Design Fukuoka Japan August2011 pp.321–326
Casey S.D.: ‘How to determine the effectiveness of hyper‐threading technology with an application’ 2011. Available athttps://software.intel.com/en‐us/articles/how‐to‐determinethe‐effectiveness‐of‐hyper‐threading‐technology‐with‐an‐application
McKee S.A., 2011, Memory wall
Zurawski R., 2009, Embedded systems handbook, second edition: embedded systems design and verification, 10.1201/9781439807637
Quinn M.J., 2003, Parallel programming in C with MPI and OpenMP
Vipin K., 2018, FPGA dynamic and partial reconfiguration: a survey of architectures, methods, and applications, ACM Comput. Surv., 51, 72:1, 10.1145/3193827
Kumar R. Tullsen D.M. Ranganathan P. et al.: ‘Single‐ISA heterogeneous multi‐core architectures for multithreaded workload performance’.Proc. 31st Annual Int. Symp. on Computer Architecture 2004 München Germany June2004 pp.64–75
Arenas M.G., 2011, GPU computation in bioinspired algorithms: a review, 433
Lee V.W. Grochowski E. Geva R.: ‘Performance benefits of heterogeneous computing in HPC workloads’.2012 IEEE 26th Int. Parallel and Distributed Processing Symp. Workshops PhD Forum Shanghai People's Republic of China May2012 pp.16–26
Kumar R. Farkas K.I. Jouppi N.P. et al.: ‘Single‐ISA heterogeneous multi‐core architectures: the potential for processor power reduction’.Proc. 36th Annual IEEE/ACM Int. Symp. on Microarchitecture 2003. MICRO‐36 San Diego CA USA December2003 pp.81–92
Greenhalgh P.: ‘White paper: big.LITTLE processing with ARM Cortex‐A15 and Cortex‐A7 – improving energy efficiency in high‐performance mobile platforms’ ARM 2011. Available athttps://www.cl.cam.ac.uk/rdm34/big.LITTLE.pdf
Venkat A. Tullsen D.M.: ‘Harnessing ISA diversity: design of a heterogeneous‐ISA chip multiprocessor’.2014 ACM/IEEE 41st Int. Symp. on Computer Architecture (ISCA) Minneapolis MN USA June2014 pp.121–132
Power J. Basu A. Gu J. et al.: ‘Heterogeneous system coherence for integrated CPU‐GPU systems’.2013 46th Annual IEEE/ACM Int. Symp. on Microarchitecture (MICRO) Davis CA USA December2013 pp.457–467
Pathania A. Jiao Q. Prakash A. et al.: ‘Integrated CPU‐GPU power management for 3D mobile games’.Proc. of the 51st Annual Design Automation Conf. ser. DAC ‘14 San Francisco CA USA 2014 pp.40:1–40:6. Available athttp://doi.acm.org/10.1145/2593069.2593151
Mujtaba H.: ‘Intel Skylake GPU architecture analysis’ 2015. Available athttps://wccftech.com/idf15‐intel‐skylake‐analysis‐cpu‐gpumicroarchitecture‐ddr4‐memory‐impact/3/
Palacios J. Triska J.: ‘A comparison of modern GPU and CPU architectures: and the common convergence of both’ Oregon State University 2011. Available athttps://hgpu.org/?p=6610
Cullinan C. Wyant C. Frattesi T. et al.: ‘Computing performance benchmarks among CPU GPU and FPGA’ Worcester Polytechnic Institute 2012. Available athttps://web.wpi.edu/Pubs/E‐project/Available/Eproject‐030212‐123508/unrestricted/Benchmarking_Final.pdf
Duda K.J., 1999, Borrowed‐virtual‐time BVT scheduling: supporting latency‐sensitive threads in a general‐purpose scheduler, SIGOPS Oper. Syst. Rev., 33, 261, 10.1145/319344.319169
Agarwal A., 1995, The MIT alewife machine: architecture and performance, SIGARCH Comput. Archit. News, 23, 2, 10.1145/225830.223985
Boothe B., 1992, Improved multithreading techniques for hiding communication latency in multiprocessors, SIGARCH Comput. Archit. News, 20, 214, 10.1145/146628.139729
The international technology roadmap for semiconductors ITRS 2017. Available athttp://www.itrs2.net/
Leiserson C.: ‘What the $#@¡ is parallelism anyhow?’ 2017. Available athttps://www.cprogramming.com/parallelism.html
Han S. Yun Y. Kim Y.H.: ‘Profiling‐based task graph extraction on multiprocessor system‐on‐chip’.2016 IEEE Asia Pacific Conf. on Circuits and Systems (APCCAS) Jeju Republic of Korea October2016 pp.510–513
Deng X. Dymond P.: ‘On multiprocessor system scheduling’.Proc. of the Eighth Annual ACM Symp. on Parallel Algorithms and Architectures ser. SPAA ‘96 Padua Italy 1996 pp.82–88. Available athttp://doi.acm.org/10.1145/237502.237510
Gupta U., 2015, Constrained energy optimization in heterogeneous platforms using generalized scaling models, IEEE Comput. Archit. Lett., 14, 21, 10.1109/LCA.2014.2326603
Kim W. Gupta M.S. Wei G.Y. et al.: ‘System level analysis of fast per‐core DVFS using on‐chip switching regulators’.2008 IEEE 14th Int. Symp. on High Performance Computer Architecture Salt Lake City UT USA February2008 pp.123–134
Aalsaud A. Rafiev A. Xia F. et al.: ‘Model‐free runtime management of concurrent workloads for energy‐efficient many‐core heterogeneous systems’.28th Int. Symp. on Power and Timing Modeling Optimization and Simulation (PATMOS) Platja d'Aro Spain 2018 pp.206–213