Backtracking-Based Instruction Scheduling to Fill Branch Delay Slots
Tóm tắt
Conventional schedulers schedule operations in dependence order and never revisit or undo a scheduling decision on any operation. In contrast, backtracking schedulers may unschedule operations and can often generate better schedules. This paper develops and evaluates the backtracking approach to fill branch delay slots. We first present the structure of a generic backtracking scheduling algorithm and prove that it terminates. We then describe two more aggressive backtracking schedulers and evaluate their effectiveness. We conclude that aggressive backtracking-based instruction schedulers can effectively improve schedule quality by eliminating branch delay slots with a small amount of additional computation.
Tài liệu tham khảo
M. S. Schlansker and B. R. Rau, EPIC: Explicitly Parallel Instruction Computing, Computer, 33(2):37–45 (2000).
S. Muchnick, Advanced Compiler Design and Implementation, Morgan Kaufmann (1997).
P. B. Gibbons and S. S. Muchnick. Efficient Instruction Scheduling for a Pipelined Architecture, in ACM SIGPLAN Symposium on Compiler Construction (1986).
H. S. Warren, Instruction Scheduling for the IBM RISC System/6000 Processor, IBM Journal of Research and Development, 34(1):85–92 (1990).
J. C. Dehnert and R. A. Toole, Compiling for the Cydra-5, J. Supercomput., 7(1/2): 181–227 (1993).
M. Rim and R. Jain, Lower-bound Performance Estimation for the High-Level Synthesis Scheduling Problem, IEEE Transactions on CAD of ICS, 13(4)</del>:452–459 (1994).
M. Langevin and E. Cherny, A Recursive Technique for Computing Lower-Bound Performance of Schedules, ACM Transactions on Design Automaton of Electronic Systems, 1(4):443–455 (1996).
A. Eichenberger and W. M. Meleis, Balance Scheduling: Weighting Branch Tradeoffs in Superblocks, in International Symposium on Microachitecture Haifa, Israel (1999).
S. G. Abraham, Efficient Backtracking Instruction Schedulers, Technical Report HPL-2000-56, Hewlett-Packard Laboratories (2000). www.hpl.hp.com/techreports/2000/HPL-2000-56.html.
B. R. Rau, Iterative Modulo Scheduling, Int. J. Parallel Prog., 24(1):3–64 (1996).
V. Kathail, M. S. Schlansker, and B. R. Rau, HPL PlayDoh Architecture Specification: Version 1.0, Technical Report, Hewlett-Packard Laboratories (1991).
The Trimaran Compilation Infrastructure (1999). www.trimaran.org.
W. W. Hwu, S. A. Mahlke, W. Y. Chen, P. P. Chang, N. J. Warter, R. A. Bringmann, R. G. Ouellette, R. E. Hank, T. Kiyohara, G. E. Haab, J. G. Holm, and D. M. Lavery, The Superblock: An Effective Technique for VLIW and Superscalar Compilation, J. Supercomput., 7(1/2):229–248 (1993).
J. A. Fisher, Global Code Generation for Instruction-Level Parallelism: Trace Scheduling-2, Technical Report, Hewlett-Packard Laboratories (1993).
S. Davidson, D. Landskov, B. D. Shriver, and P. W. Mallet, Some Experiments in Local Microcode Compaction for Horizontal Machines, IEEE Transactions on Computers, C-30(7): 460–477 (1981).
B. R. Rau and J. A. Fisher, Instruction_Level Parallel Processing: History, Overview and Perspective, J. Supercomput., 7(1/2):9–50 (1993).
J. L. Hennessy and T. Gross, Postpass Code Optimization of Pipeline Constraints, ACM Transactions on Programming Languages and Systems, 5(3):422–448 (1983).
V. Bala and N. Rubin, Efficient Instruction Scheduling Using Finite State Automata, Int. J. Parallel Prog., 25(2):53–82 (1997).
J. Fisher, Trace Scheduling: A Technique for Global Microcode Compaction, IEEE Transactions on Computers, C-30(7):478–490 (1981).
P. G. Lowney, S. M. Freudenberger, T. J. Karzes, W. D. Lichtenstein, R. P. Nix, J. S. O'Donnell, and J. C. Ruttenberg, The Multiflow Trace Scheduling Compiler, J. Supercomput., 7(1/2):51–142 (1993).
J. Ferrante, K. J. Ottenstein, and J. D. Warren, The Program Dependence Graph and Its Use in Optimization, ACM Transactions on Programming Languages and Systems, 9(3):319–349 (1987).
D. Bernstein and M. Rodeh, Global Instruction Scheduling for Superscalar Processor, in ACM SIGPLAN Symposium on Programming Languages Design and Implementation, Toronto, Canada (1991).
S. M. Moon and K. Ebcioglu, An Efficient Resource_Constrained Global Scheduling Method for Superscalar and VLIW Processors, in International Symposium on Microachitecture, Portland, Oregon (1992).
P. P. Chang, S. A. Mahlke, and W. W. Hwu, Using Profile Information to Assist Classic Code Optimization, Software Practice and Experience, 21(12):1301–1321 (1991).
S. A. Mahlke, D. C. Lin, W. Y. Chen, R. E. Hank, and R. A. Bringmann, Effective Compiler Support for Predicated Execution Using the Hyperblock, in International Symposium on Microachitecture, Portland, Oregon (1992).
J. Bharadwaj, K. Menezes, and C. McKinsey, Wavefront Scheduling: Path Based Data Representation and Scheduling of Subgraphs, in International Symposium on Microarchitecture, Haifa, Israel (1999).
S. G. Abraham, V. Kathail, and B. L. Dietrich, Meld Scheduling: A Technique for Relaxing Scheduling Constraints, Int. J. Parallel Prog., 26(4):349–381 (1998).
Y. Wang, N. Amato, and D. Friesen, Hindsight Helps: Deterministic Task Scheduling with Backtracking, in International Conference on Parallel Processing, Bloomington, Illinois (1997).
J. Hoogerbrugge and L. Augusteijn, Instruction Scheduling for TriMedia, J. Instruction-Level Parallelism (1) (1999).
A. E. Charlesworth, An Approach to Scientific Array Processing: The Architectural Design of the AP-120B/FPS-164 Family, Computer, 14(9):18–27 (1981).
B. R. Rau, D. W. L. Yen, W. Yen, and R. A. Towle, The Cydra 5 Departmental Supercomputer: Design Philosophies, Decisions and Trade_Offs, Computer, 22(1):12–34 (1989).