Elastic computing: A portable optimization framework for hybrid computers
Tài liệu tham khảo
Advanced Micro Devices, Inc., AMD Accelerated Processing Units, 2012. <http://fusion.amd.com/>.
J. Ansel, C. Chan, Y.L. Wong, M. Olszewski, Q. Zhao, A. Edelman, S. Amarasinghe, Petabricks: a language and compiler for algorithmic choice, in: Proceedings of ACM SIGPLAN Conference Programming Language Design and Implementation, 2009, pp. 38–49.
C. Augonnet, S. Thibault, R.Namyst, P. Wacrenier, StarPU: a unified platform for task scheduling on heterogeneous multicore architectures, Concurrency Comput.: Practice Experience 23(2) (2011) pp. 187–198.
Austin, 2002, Simplescalar: an infrastructure for computer system modeling, Computer, 35, 59, 10.1109/2.982917
Barron, 1994, Performance of optical flow techniques, Int. J. Comput. Vision, 12, 43, 10.1007/BF01420984
Bentley, 1979, Algorithms for reporting and counting geometric intersections, IEEE Trans. Comput., C-28, 643, 10.1109/TC.1979.1675432
B. Bond, K. Hammil, L. Litchev, S. Singh, FPGA circuit synthesis of accelerator data-parallel programs, in: Proceeding of 18th IEEE Annual International Symposium Field-Programmable Custom Computing Machines, 2010, pp. 167–170.
Buck, 2004, Brook for GPUs: stream computing on graphics hardware, ACM Trans. Graphics., 23, 777, 10.1145/1015706.1015800
Cooper, 2001, Adaptive optimizing compilers for the 21st century, J. Supercomputing, 23, 7, 10.1023/A:1015729001611
Craven, 2007, Examining the viability of FPGA supercomputing, EURASIP J. Embedded Syst., 2007, 13, 10.1186/1687-3963-2007-093652
R. Datta, J. Li, J. Z. Wang, Content-based image retrieval: approaches and trends of the new age, in: Proceedings.of 7th ACM SIGMM International. Workshop Multimedia, Information Retrieval, 2005 pp. 253–262.
Davis, 1975, A comparison of heuristic and optimum solutions in resource-constrained project scheduling, Manage. Sci., 21, 944, 10.1287/mnsc.21.8.944
Dean, 2008, Mapreduce. simplified data processing on large clusters, Comm. ACM, 51, 107, 10.1145/1327452.1327492
DeHon, 2000, The density advantage of configurable computing, Computer, 33, 41, 10.1109/2.839320
Y. Dong, Y. Dou, J. Zhou, Optimized generation of memory structure in compiling window operations onto reconfigurable hardware, in Proceedings of third International Conference Reconfigurable Computing: Architectures, Tools, and Applications, 2007, pp. 110–121.
A.E. Eichenberger, K. O’Brien, P. Wu, T. Chen, P.H. Oden, D.A. Prener, J.C. Shepherd, B. So, Z. Sura, A. Wang, T. Zhang, P. Zhao, M. Gschwind, Optimizing Compiler for the Cell Processor, in: Proceedings of 14th International Conference Parallel Architectures and Compilation, Techniques, 2005, pp. 161–172.
Eker, 2003, Taming heterogeneity – the ptolemy approach, Proc. IEEE, 91, 127, 10.1109/JPROC.2002.805829
Eles, 1997, System level hardware/software partitioning based on simulated annealing and Tabu search, Design Autom. Embedded Syst., 2, 5, 10.1023/A:1008857008151
Feng, 2007, The green500 list: encouraging sustainable supercomputing, Computer, 40, 50, 10.1109/MC.2007.445
M. Frigo, S.G. Johnson, FFTW: an adaptive software architecture for the FFT, in: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 1998, pp. 1381–1384.
George, 2011, Novo-G: at the forefront of scalable reconfigurable supercomputing, Comput. Sci. Eng., 13, 82, 10.1109/MCSE.2011.11
Girkar, 1995, Extracting task-level parallelism, ACM T. Progr. Lang. Syst., 17, 600, 10.1145/210184.210189
B. Grattan, G. Stitt, F. Vahid, Codesign-Extended Applications, in: Proceedings of Tenth International Symposium on Hardware/Software, Codesign, 2002, pp. 1–6.
E. Grobelny, C. Reardon, A. Jacobs, A. George, Simulation framework for performance prediction in the engineering of reconfigurable systems and applications, in: Proceedings of International Conference Engineering Reconfigurable Systems and Algorithms, 2007, pp. 124–130.
Z. Guo, W. Najjar, F. Vahid, K. Vissers, A quantitative analysis of the speedup factors of fpgas over processors, in: Proceedings ACM/SIGDA 12th International Symposium on Field Programmable Gate Arrays, 2004, pp. 162–170.
S. Gupta, N. Dutt, R. Gupta, A. Nicolau, SPARK: a high-level synthesis framework for applying parallelizing compiler transformations, in: Proceedings of 16th International Conference VLSI Design, 2003, pp. 461–466.
B. Holland, K. Nagarajan, C. Conger, A. Jacobs, A.D. George, RAT: a methodology for predicting performance in application design migration to FPGAs, in: Proceedings of First International Workshop High-Performance Reconfigurable Computing Technology and Applications, 2007, pp. 1–10.
P. Husbands, C. Iancu, K. Yelick, A performance analysis of the Berkeley UPC compiler, in: Proceedings of 17th Annual International. Conference Supercomputing, 2003, pp. 63–73.
IBM, The Cell Project, 2012<http://www.research.ibm.com/cell/>.
Ierotheou, 2001, The semi-automatic parallelisation of scientific application codes using a computer aided parallelisation toolkit, Scientific Programming, 9, 163, 10.1155/2001/327048
Impulse Accelerated Technologies, C-to-FPGA Tools, 2012. <http://www.impulsec.com/products_universal.htm>, .
Intel Corporation, Intel Software Network – Code & Downloads, 2012. <http://software.intel.com/en-us/articles/code-downloads/>.
Intel Corporation, Many Integrated Core (MIC) Architecture,2012. <http://www.intel.com/content/www/us/en/architecture-and-technology/many-integrated-core/intel-many-integrated-core-architecture.html>.
A. Ismail, L. Shannon, FUSE: front-end user framework for O/S abstraction of hardware accelerators, in: Proceedings of IEEE 19th Annual International Symposium Field-Programmable Custom Computing Machines, 2011, pp. 170–177.
Khronos Group, OpenCL, 2012<http://www.khronos.org/opencl/>.
Knijnenburg, 2002
Li, 1999, Performance estimation of embedded software with instruction cache modeling, ACM Trans.Design Autom. Electronic Syst., 4, 257, 10.1145/315773.315778
C. Luk, S. Hong, H. Kim, Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping, in: Proceedings of 42nd Annual IEEE/ACM International Symposium on Microarchitecture, 2009 pp. 45–55.
Macedonia, 2003, The GPU enters computing’s mainstream, Computer, 36, 106, 10.1109/MC.2003.1236476
G. Madl, N. Dutt, S. Abdelwahed, Performance estimation of distributed real-time embedded systems by discrete event simulations, in: Proceedings of Seventh ACM & IEEE International Confonference on Embedded Software, 2007, pp. 183–192.
M.D. McCool, RapidMind Inc., Data-Parallel Programming on the Cell BE and the GPU Using the RapidMind Development Platform, Presented at the GSPx Multicore Applications Conference, Santa Clara, CA, October/November 2006.
Mentor Graphics, Catapult C Synthesis Overview, 2012<http://www.mentor.com/products/c-based_design/catapult_c_synthesis/index.cfm>.
S.G. Merchant, B.M. Holland, C. Reardon, A.D. George, H. Lam, G. Stitt, M.C. Smith, N. Alam, I. Gonzalez, E. El-Araby, P. Saha, T. El-Ghazawi, H. Simmler, Strategic challenges for application development productivity in reconfigurable computing, in: Proceedings of IEEE National Aerospace and Electronics Conference, 2008 pp. 209–218.
Mercury Federal Systems, Inc., OpenCPI – Open Component Portability Infrastructure, 2012. <http://opencpi.org/>.
Micheli, 1994
Moore, 2007, Vforce: an extensible framework for reconfigurable supercomputing, Computer, 40, 39, 10.1109/MC.2007.110
Musser, 1997, Introspective sorting and selection algorithms, Software Pract. Exper., 27, 983, 10.1002/(SICI)1097-024X(199708)27:8<983::AID-SPE117>3.0.CO;2-#
NSF Center for High-Performance Reconfigurable Computing (CHREC), FPGA Tool-Flow Studies Workshop, 2012. <http://www.chrec.org/ftsw/>.
Nudd, 2000, Pace–a toolset for the performance prediction of parallel and distributed systems, Int. J. High Perform. Comput. Appl., 14, 228, 10.1177/109434200001400306
NVIDIA Corporation, NVIDIA Developer Zone – CUDA Downloads, 2012. <http://www.nvidia.com/object/cuda_develop.html>
NVIDIA Corporation, NVIDIA Developer Zone – CUDA Toolkit 3.2 Downloads, 2012. <http://developer.nvidia.com/cuda-toolkit-32-downloads>.
I. Ouaiss, S. Govindarajan, V. Srinivasan, M. Kaul, R. Vemuri, An integrated partitioning and synthesis system for dynamically reconfigurable multi-FPGA architectures, in: Proceedings of 12th International Parallel Processing Symposium, and Ninth Symposium Parallel and Distributed Processing, 1998, pp. 31–36.
S. Pai, R. Govindarajan, M.J. Thazhuthaveetil, PLASMA: portable programming for simd heterogeneous accelerators, in: Presented in First Workshop on Language, Compiler, and Architecture Support for GPGPU, 2010.
M. Palesi, T. Givargis, Multi-objective design space exploration using genetic algorithms, in: Proceedings of Tenth International Symposium on Hardware/Software, Codesign, 2002, pp. 67–72.
P.R. Panda, SystemC – a modeling platform supporting multiple design abstractions, in: Proceedings of 14th International Symposium on System, Synthesis, 2001, pp. 75–80.
W. Pfeiffer, N. J. Wright, Modeling and predicting application performance on parallel computers using HPC challenge benchmarks, in: Proceedings of IEEE International Symposium on Parallel and Distributed Processing, 2008, pp. 1–12.
Puschel, 2005, SPIRAL: code generation for DSP transforms, Proc. IEEE, 93, 232, 10.1109/JPROC.2004.840306
H. Quinn, L.A.S. King, M. Leeser, W. Meleis, Runtime assignment of reconfigurable hardware components for image processing pipelines, in: Proceedings of 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2003, pp. 173–182.
C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, C. Kozyrakis, Evaluating Mapreduce for multi-core and multiprocessor systems, in: Proceedings of IEEE 13th International Symposium on High Performance Computer, Architecture, 2007, pp. 13–24.
Reardon, 2010, A simulation framework for rapid analysis of reconfigurable computing systems, ACM Trans. Reconfigurable Technol. Syst., 3, 25:1, 10.1145/1862648.1862655
Semeria, 2001, Synthesis of hardware models in c with pointers and complex data structures, IEEE Trans Very Large Scale Integration Syst., 9, 743, 10.1109/92.974889
A. Snavely, L. Carrington, N.Wolter, J. Labarta, R. Badia, A. Purkayastha, A framework for performance modeling and prediction, in: Proceedings of ACM/IEEE Conference on Supercomputing, 2002, pp. 21–21.
Stitt, 2002, Energy advantages of microprocessor platforms with on-chip configurable logic, IEEE Design Test Comput., 19, 36, 10.1109/MDT.2002.1047742
G. Stitt, F. Vahid, W. Najjar, A code refinement methodology for performance-improved synthesis from C, in: Proceedings of IEEE/ACM International Conference on Computer-Aided Design, 2006, pp. 716–723.
TOP500.Org, Power Consumption of Supercomputers – June 2008, <http://www.top500.org/lists/2008/06/highlights/power>. 2012.
TOP500.Org, TOP500 List – June 2010, 2012.<http://www.top500.org/list/2010/06/100>
TOP500.Org, Tianhe-1 – NUDT TH-1 Cluster, 2012<http://www.top500.org/system/10186>.
Vuduc, 2005, OSKI: a library of automatically tuned sparse matrix kernels, J. Phys.: Conf. Ser., 16, 521, 10.1088/1742-6596/16/1/071
J.R. Wernsing, G. Stitt, Elastic computing: a framework for transparent, portable, and adaptive multi-core heterogeneous computing, in: Proceedings of ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems, 2010, pp. 115–124.
Whaley, 2001, Automated empirical optimization of software and the ATLAS project, Parallel Comput., 27, 3, 10.1016/S0167-8191(00)00087-9
Williams, 2010, Characterization of fixed and reconfigurable multi-core devices for application acceleration, ACM Trans. Reconfigurable Technol. Syst., 3, 19:1, 10.1145/1862648.1862649
Xilinx Inc., Intellectual Property (IP) Cores, 2012. <http://www.xilinx.com/products/intellectual-property/index.htm>.