The multiflow trace scheduling compiler

P. Geoffrey Lowney1, Stefan M. Freudenberger2, Thomas J. Karzes3, Woody Lichtenstein4, Robert Nix5, John O'Donnell6, John Ruttenberg7
1Digital Equipment Corporation, HLO2-3/J03, Hudson
2Hewlett-Packard Labs, Palo Alto, USA
3D.E. Shaw and Co., New York, USA
4Thinking Machines Corp., Cambridge
5Digital Equipment Corporation, Boxborough
6Equator Technologies, Seattle
7Silicon Graphics, Milford, USA

Tóm tắt

Từ khóa


Tài liệu tham khảo

Aho, A.V., and Ullman, J.D. 1977.Principles of Compiler Design. Addison-Wesley, Reading, Mass.

Aiken, A., and Nicolau, A. 1988. Optimal loop parallelization. InProc., SIGPLAN '88 Conf. on Programming Language Design and Implementation (Atlanta, June), pp. 308?317.

Allen, F.E., Carter, J.L., Fabri, J., Ferrante, J., Harrison, W.H., Loewner, P.G., and Trevillyan, L.H. 1980. The experimental compiling system.IBM J. Res. and Dev., 24, 6 (Nov.): 695?715.

Allen, J.R., and Kennedy, K. 1984. Automatic loop interchange. InProc., ACM SIGPLAN '84 Symp. on Compiler Construction (Montreal, June), pp. 233?246.

Allen, R., and Kennedy, K. 1987. Automatic translation of FORTRAN programs to vector form.ACM Trans. Programming Languages and Systems, 9, 4 (Oct.): 491?542.

American National Standards. 1978. American National Standard Programming Language Fortran. Am. Nat. Standards Institute, New York.

Banerjee, U. 1976. Data dependence in ordinary programs. M.S. thesis, Dept. of Comp. Sci. Rept. no. 76-837, Univ. of Ill., Urbana-Champaign, Ill.

Banerjee, U. 1979. Speedup of ordinary programs. Ph.D. thesis, Dept. of Comp. Sci. Rept. no. 79?989, Univ. of Ill., Urbana-Champaign, Ill.

Callahan, D., and Koblenz, B. 1991. Register allocation via hierarchical graph coloring. InProc., ACM SIGPLAN '91 Conf. on Programming Language Design and Implementation (Toronto, June), pp. 192?203.

Callahan, D., Carr, S., and Kennedy, K. 1990. Improving register allocation for subscripted variables. InProc., ACM SIGPLAN '90 Conf. on Programming Languge Design and Implementation (White Plains, N.Y., June), pp. 53?65.

Chaitin, G.J. 1982. Register allocation and spilling via graph coloring. InProc., ACM SIGPLAN '82 Symp. on Compiler Construction (Boston, June), pp. 98?105.

Chaitin, G.J., Auslander, M.A., Chandra, A.K., Cocke, J., Hopkins, M.E., and Markstein, P.W. 1981. Register allocation via coloring.Comput. Lang., 6, 1 (Jan.): 47?57.

Chastain, M., Gostlin, G., Mankovich, J., and Wallach, S. 1988. The Convex C240 architecture. InProc., IEEE 1988 Supercomputing Conf., pp. 321?329.

Chow, F.C. 1983. A portable machine-independent global optimizer?Design and measurements. Tech. Rept. 83-254, Comp. Systems Lab., Stanford Univ., Stanford, Calif.

Chow, F., and Hennessy, J. 1984. Register allocation by priority-based coloring. InProc., ACM SIGPLAN '84 Symp. on Compiler Construction (Montreal, June), pp. 222?232.

Chow, F.C., and Hennessy, J.L. 1990. The priority-based coloring approach to register allocation.ACM Trans. Programming Languages and Systems, 12, 4 (Oct.): 501?536.

Chow, F., Himelstein, M., Killian, E., and Weber, L. 1986. Engineering a RISC compiler system. InProc., IEEE Compcon.

Chow, F., Correll, S., Himelstein, M., Killian, E., and Weber, L. 1987. How many addressing modes are enough? InProc., Second Internat. Conf. on Architectural Support for Programming Languages and Operating Systems (Palo Alto, Calif., Oct.), pp. 117?121.

Colwell, R.P., Nix, R.P., O'Donnell, J.J., Papworth, D.B., and Rodman, P.K. 1988. A VLIW architecture for a trace scheduling compiler.IEEE Trans. Comps., C-37, 8 (Aug.): 967?979.

Colwell, R.P., Hall, W.E., Joshi, C.S., Papworth, D.B., Rodman, P.K., and Tornes, J.E. 1990. Architecture and implementation of a VLIW supercomputer. InProc., Supercomputing '90 (Nov.), pp. 910?919.

Dehnert, J.C., Hsu, P.Y.-T., and Bratt, J. P. 1989. Overlapped loop support in the Cydra 5. InProc., Third Internat. Conf. on Architectural Support for Programming Languages and Operating Systems (Boston, Apr.), pp. 26?38.

Dongarra, J.J. 1991. Performance of various computers using standard linear equations software. CS-89-85, Dept. of Comp. Sci., Univ. of Tenn., Knoxville, Tenn.

Ellis, J.R. 1986.Bulldog: A Compiler for VLIW Architectures. MIT Press, Cambridge, Mass.

Ferrante, J., Ottenstein, K.J., and Warren, J.D. 1987. The program dependence graph and its use in optimization.ACM Trans. Programming Languages and Systems, 9, 3, (July): 319?349.

Feldman, S.I. 1979. Implementation of a portable F77 compiler using modern tools. InProc., SIGPLAN Symp. on Compiler Construction (Aug.), pp. 98?106.

Fisher, J.A. 1979. The optimization of horizontal microcode within and beyond basic blocks: An application of processor scheduling with resources. Ph.D. thesis, New York Univ., New York.

Fisher, J.A. 1981. Trace scheduling: A technique for global microcode compaction.IEEE Trans. Comps., C-30, 7 (July): 478?490.

Fisher, J.A., Ellis, J.R., Ruttenberg, J.C. and Nicolau, A. 1984. Parallel processing: A smart compiler and a dumb machine. InProc., ACM SIGPLAN '84 Symp. on Compiler Construction (Montreal, June), pp. 37?47.

Foster, C.C., and Riseman, E.M. 1972. Percolation of code to enhance parallel dispatching and execution.IEEE Trans. Comps., C-21, 12 (Dec): 1411?1415.

Freudenberger, S.M., and Ruttenberg, J.C. 1992. Phase ordering of register allocation and instruction scheduling. InCode Generation?Concepts, Tools, Techniques (R. Giergerich and S.L. Graham, eds.), Springer-Verlag, London, pp. 146?170.

Gibbons, P.B., and Muchnik, S.S. 1986. Efficient instruction scheduling for a pipelined architecture. InProc., SIGPLAN '86 Symp. on Compiler Construction (Palo Alto, Calif., June), pp. 11?16.

Gross, T. 1983. Code optimization of pipeline constraints. Tech. Rept. no. 83-255, Comp. Systems Lab., Stanford Univ., Stanford, Calif.

Gross, T., and Ward, M. 1990. The suppression of compensation code. InAdvances in Languages and Compilers for Parallel Computing (A. Nicolau, D. Gelernter, T. Gross, and D. Padua, eds.), Pitman/MIT Press, pp. 260?273.

Hennessy, J.L., and Gross, T.R. 1983. Postpass code optimization of pipeline constraints.ACM Trans. Programming Languages and Systems, 5, 3 (July): 422?448.

Hewlett-Packard. 1991.PA-RISC Procedure Calling Convention Reference Manual. Hewlett-Packard Laboratories, order no. 09740-90015.

Himelstein, M.I. 1991. Compiler tail ends. Tutorial notes,SIGPLAN '91 Conf. on Programming Language Design and Implementation.

IBM. 1992.IBM RS/6000 Assembler Language Reference, 2nd ed. IBM Corp. (Jan.).

Johnson, S.C. 1979. A tour through the portable C compiler. Bell Labs., Murray Hill, N.J.

Joshi, C.S., Reger, B.A., and Feehrer, J.R. 1991. Memory system for a statically scheduled supercomputer. InProc., 1991 Internat. Conf. on Parallel Processing, vol. 1, pp. 196?203.

Jouppi, N.P., and Wall, D.W. 1989. Available instruction-level parallelism for superscalar and superpipelined machines. InProc., Third Internat. Conf. on Architectural Support for Programming Languages and Operating Systems (Boston, Apr.), pp. 272?182.

Kane, G. 1987.MIPS R2000 RISC Architecture. Prentice-Hall, Englewood Cliffs, N.J.

Kernighan, B.W., and Ritchie, D.M. 1978.The C Programming Language. Prentice-Hall, Englewood Cliffs, N.J.

Knuth, D.E. 1981.The Art of Computer Programming, vol. 2, Seminumerical Algorithms, 2nd ed. Addison-Wesley, Reading, Mass.

Kuck, D.J., Kuhn, R.H., Padua, D.A., Leasure, B., and Wolfe, M. 1981. Dependence graphs and compiler optimizations. InProc., Eighth Annual ACM Symp. on Principles of Programming Languages (Williamsburg, Va., Jan.), pp. 207?218.

Lam, M. 1988. Software pipelining: An effective scheduling technique for VLIW machines. InProc., SIGPLAN '88 Conf. on Programming Design and Implementation (Atlanta, June 22?24), pp. 318?328.

Lam, M.S. 1989.A Systolic Array Optimizing Compiler. Kluwer Academic, Norwell, Mass.

Linn, J.L. 1983. SRDAG compaction?A generalization of trace scheduling to increase the use of global context information. InProc., 16th Annual Microprogramming Workshop (Oct.), pp. 11?22.

Linn, J.L. 1988. Horizontal microcode compaction. InMicroprogramming and Firmware Engineering Methods (S. Habib, ed.), Van Nostrand Rheinhold, New York, pp. 381?431.

Little, J.D.C. 1961. A proof of the queing formulaL = ?W.Operations Research, 9, 3 (May): 383?387. (Cited in Coffman, E.G., Jr., and Denning, P.J. 1973.Operating Systems Theory. Prentice-Hall, Englewood Cliffs, N.J.)

Lubeck, O.M. 1988. Supercomputer performance: The theory, practice, and results. LA-11204-MS, Los Alamos Nat. Lab., Los Alamos, N.M.

McMahon, F.H. 1986. The Livermore Fortran kernels: A computer test of the numerical performance range. UCRL-53745, Lawrence Livermore Nat. Lab., Univ. of Calif., Livermore.

Mercer, R. 1988. The CONVEX FORTRAN 5.0 compiler. InProc., Supercomputing '88 (St. Petersburg, Fla., Nov.), pp. 164?175.

Morel, E., and Renvoise, C. 1979. Global optimization by suppression of partial redundancies.CACM, 22, 2 (Feb.): 96?103.

Nicolau, A. 1984. Parallelism, memory anti-aliasing and correctness for trace scheduling compilers. Ph.D. thesis, Yale Univ., New Haven, Conn.

Nicolau, A., and Fisher, J. A. 1981. Using an oracle to measure parallelism in single instruction stream programs. InProc., 14th Annual Microprogramming Workshop (Oct.), pp. 171?182.

O'Brien, K., Hay, B., Minish, J., Schaffer, H., Schloss, B., Shepherd, A., and Zaleski, M. 1990. Advanced compiler technology for the RISC System/6000 architecture. InIBM RISC System/6000 Technology. IBM Corp., Austin, Tex., pp. 154?161.

Odnert, D., Hansen, R., Dadoo, M., and Laventhal, M. 1991. Architecture and compiler enhancements for PA-RISC workstations. InProc., IEEE Compcon (spring), pp. 214?218.

Padua, D.A., and Wolfe, M.J. 1986. Advanced compiler optimizations for supercomputers.CACM, 29, 12 (Dec): 1184?1201.

Pettis, K., and Hansen, R.C. 1990. Profile guided code positioning. InProc., ACM SGPLAN90 Conf. on Programming Language Design and Implementation (White Plains, N.Y., June), pp. 16?27.

Rau, B.R., and Glaeser, C.D. 1981. Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing. InProc., Fourteenth Annual Workshop on Microprogramming (Oct.), pp. 183?198.

Rau, B.R., Yen, D.W.L., Yen, W., and Towle, R.A. 1989. The Cydra 5 departmental supercomputer: Design philosophies, decisions, and trade-offs.IEEE Computer, 22, 1 (Jan.): 12?34.

Reif, J.H. 1977. Combinatorial aspects of symbolic program analysis. Tr-11-77, Center for Research in Computing Technology, Cambridge, Mass.

Reif, J.H. 1980. Code motion.SIAM J. Computing, 9, 2 (May): 375?395.

Reif, J.H., and Tarjan, R.E. 1982. Symbolic analysis in almost linear time.SIAM J. Computing, 11, 1 (Feb.): 81?93.

Rodman, P. 1989. High performance FFTs for a VLIW architecture. InProc., 1989 Symp. on Computer Architecture and Digital Signal Processing (Hong Kong).

SPEC. 1989.SPEC Newsletter, 1, 1 (fall). (Published by Waterside Assoc, Fremont, Calif.)

Thornton, J.E. 1970.Design of a Computer: The Control Data 6600. Scott, Foresman, Glenview, Ill.

Tirumalai, P., Lee, M., and Schlansker, M. S. 1990. Parallelization of loops with exits on pipelined architectures. InProc., Supercomputing '90 (New York, Nov.), pp. 200?212.

Tjaden, G.S., and Flynn, M.J. 1970. Detection and parallel execution of independent instructions.IEEE Trans. Comps., C-19, 10 (Oct.): 889?895.

Tomasulo, R.M. 1982. An efficient algorithm for exploiting multiple arithmetic units. InComputer Structures: Principles and Examples, McGraw-Hill, New York, pp. 293?305.

Wall, D.W. 1991. Limits of instruction-level parallelism. InProc., Fourth Internat. Conf. on Architectural Support for Programming Languages and Operating Systems (Santa Clara, Calif., Apr.), pp. 176?188.

Warren, H.S. 1990. Instruction scheduling for the IBM RISC System/6000 processor.IBM J. Res. and Dev., 34, 1, (Jan.): 85?92.

Wolfe, M. 1989.Optimizing Supercompilers for Supercomputers. MIT Press, Cambridge, Mass.