Parallelization of Control Recurrences for ILP Processors

International Journal of Computer & Information Sciences - Tập 24 - Trang 65-102 - 2016

Michael Schlansker¹, Vinod Kathail¹, Sadun Anik¹

¹Computer Research Center, Hewlett-Packard Laboratories, Palo Alto, USA

Tóm tắt

The performance of applications executing on processors with instruction level parallelism is often limited by control and data dependences. Performance bottlenecks caused by dependences can frequently be eliminated through transformations which reduce the height of critical paths through the program. The utility of these techniques can be demonstrated in an increasingly broad range of important situations. This paper focuses on the height reduction of control recurrences within loops with data dependent exits. Loops with exits are transformed so as to alleviate performance bottlenecks resulting from control dependences. A compilation approach to effect these transformations is described. The techniques presented in this paper used in combination with prior work on reducing the height of data dependences provide a comprehensive approach to accelerating loops with conditional exits. In many cases, loops with conditional exits provide a degree of parallelism traditionally associated with vectorization. Multiple iterations of a loop can be retired in a single cycle on a processor with adequate instruction level parallelism with no cost in code redundancy. In more difficult cases, height reduction requires redundant computation or may not be feasible.

Tài liệu tham khảo

J. Ferrante, K. Ottenstein, and J. Warren, The program dependence graph and its use in optimization, ACM Trans. on Programming Language Systems 9(3):319–349 (1987). A. Nicolau and J. A. Fisher, Measuring the parallelism available for very long instruction word architectures, IEEE Trans. on Computers C-33(11):968–976 (1984). N. P. Jouppi and D. Wall, Available instruction level parallelism for superscalar and superpiplined machines, Proc. of the Third Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Boston, Massachusetts, pp. 272–282 (1989). D. W. Wall, Limits of instruction-level paralleiism, Proc. of the Fourth Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Santa Clara, California, pp. 176–188 (1991). M. S. Lam and R. P. Wilson, Limits of control flow on parallelism, Proc. of the 19th Int. Symp. on Computer Architecture, Gold Coast, Australia, pp. 46–57 (1992). D. J. Kuck, The Structure of Computers and Computations, Wiley, New York, 1978. M. Schlansker and V. Kathail, Acceleration of first and higher order recurrences on processors with instruction level parallelism, In Sixth Int. Workshop on Languages and Compilers for Parallel Computing, U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, eds., Springer-Verlag, pp. 406–429 (1993). J. A. Fisher, Trace scheduling: A technique for global microcode compaction, IEEE Trans. on Computers C-30, 7:478–490 (1981). P. Tirumalai, M. Lee, and M. S. Schlansker, Parallelization of loops with exits on pipelined architectures, Proc. of the Super Computing ’90, New York, pp. 200–212 (1990). W.-M. W. Hwu, S. A. Mahlke, W. Y. Chen, P. P. Chang, N. J. Warter, R. A. Bringmann, R. G. Ouellete, R. E. Hank, T. Kiyohara, G. E. Habb, J. G. Holm, and D. M. Lavery, The Superblock: An effective technique for VLIW and superscalar compilation, The Journal of Super Computing 7(1/2):229–248 (1993). S. A. Mahlke, W. Y. Chen, R. A. Bringmann, R. E. Hank, W. W. Hwu, B. R. Rau, and M. S. Schlansker, Sentinel scheduling: A model for compiler-controlled speculative execution. ACM Trans. on Computer Systems 11(4):376–408 (1993). J. A. Fisher, Global code generation for instruction-level parallelism: trace scheduling-2, Technical Report HPL-93-43, Hewlett-Packard Laboratories, Palo Alto, California, 1993. A. Nicolau, Parallelism, memory-anti-aliasing and correctness for trace scheduling compilers, Ph. D. Thesis. Yale University, 1984. J. R. Ellis, Bulldog: A Compiler for VLIW Architectures MIT Press, Cambridge, Massachussetts (1985). S.-M. Moon and K. Ebcioglu, An efficient resource-constrained global scheduling technique for superscalar and VLIW processors, Proc. of the 25th Ann. Int. Symp. on Microarchitecture, Portland, Oregon (1992). G. Lowney, S. Freudenberger, T. Karzes, W. D. Lichtenstein, R. Nix, J. O’Donnell, and J. Ruttenberg, The Multiflow Trace Scheduling Compiler, The Journal of Super Computing 7(1/2):51–142 (1993). P. Y. T. Hsu and E. S. Davidson, Highly concurrent scalar processing, Proc. of the Thirteenth Ann. Int. Symp. on Computer Architecture, Tokyo, Japan, pp. 386–395 (1986). B. R. Rau, Cydra 5 directed dataflow architecture, Proc. of the COMPCON ’88, San Francisco, California, pp. 106–113 (1988). B. R. Rau, D. W. L. Yen, W. Yen, and R. A. Towle, The Cydra 5 departmental supercomputer: Design philosophies, decisions and trade-offs, Computer 22(1): 12–35 (1989). J. C. H. Park and M. S. Schlansker, On predicated execution, Technical Report HPL-91-58, Hewlett-Packard Laboratories, Palo Alto California, 1991. V. Kathail, M. S. Schlansker, and B. R. Rau, HPL PlayDoh architecture specification: Version 1.0, Technical Report HPL-93-80, Hewlett-Packard Laboratories, Palo Alto, California (1993). B. R. Rau, Data flow and dependence analysis for instruction level paralleiism, In Fourth Int. Workshop on Languages and Compilers for Parallel Computing, U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, eds., Springer-Verlag, pp. 236–250 (1992). B. R. Rau and C. D. Glaeser, Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific Computing, Proc. of the Fourteenth Ann. Workshop on Microprogramming, pp. 183–198 (1981). J. C. Dehnert, P. Y. T. Hsu, and J. P. Bratt, Overlapped Loop Support on the Cydra 5, Proc. of the Third Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Boston, Massachusetts, pp. 26–38 (1989). J. C. Dehnert and R. A. Towle, Compiling for the Cydra 5, The Journal of Supercomputing 7(1/2):181–228 (1993). N. J. Warter, S. A. Mahlke, W. W. Hwu, and B. R. Rau, Reverse if-conversion, Proc. of the SIGPLAN ’93 Conf. on Programming Language Design and Implementation, Albuquerque, New Mexico, pp. 290–299 (1993). M. Schlansker, V. Kathail, and S. Anik, Height reduction of control recurrences for ILP processors, Proc. of the 27th Ann. Int. Symp. on Microarchitecture, San Jose, California, pp. 40–51 (1994). B. R. Rau, M. S. Schlansker, and P. P. Tirumalai, Code generation Schemas for modulo scheduled DO-loops and WHILE-loops, Technical Report HPL-92-47, Hewlett-Packard Laboratories, Palo Alto California (1992). F. H. McMahon, The Livermore Fortran Kernels: A computer test of the numerical performance range, Technical Report UCRL-53745, Lawrence Livermore National Laboratory (1986). H.-P. Company, PA-RISC 1.1 Architecture and Instruction Set Reference Manual, Second Edition Hewlett-Packard (1992). S. A. Mahlke, D. C. Lin, W. Y. Chen, R. E. Hank, and R. A. Bringmann, Effective compiler support for predicated execution using the hyperblock, Proc. of the 25th Ann. Int. Symp. on Microarchitecture, Portland, Oregon, pp. 45–54 (1992).

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA