Fast, contention-free combining tree barriers for shared-memory multiprocessors

Michael L. Scott1, John M. Mellor-Crummey2
1Computer Science Department, University of Rochester, Rochester
2Computer Science Department, Rice University, Houston

Tóm tắt

In a previous article,(1) Gupta and Hill introduced anadaptive combining tree algorithm for busy-wait barrier synchronization on shared-memory multiprocessors. The intent of the algorithm was to achieve a barrier in logarithmic time when processes arrive simultaneously, and in constant time after the last arrival when arrival times are skewed. Afuzzy (2) version of the algorithm allows a process to perform useful work between the point at which it notifies other processes of its arrival and the point at which it waits for all other processes to arrive. Unfortunately, adaptive combining tree barriers as originally devised perform a large amount of work at each node of the tree, including the acquisition and release of locks. They also perform an unbounded number of accesses to nonlocal locations, inducing large amounts of memory and interconnect contention. We present new adaptive combining tree barriers that eliminate these problems. We compare the performance of the new algorithms to that of other fast barriers on a 64-node BBN Butterfly 1 multiprocessor, a 35-node BBN TC2000, and a 126-node KSR 1. The results reveal scenarios in which our algorithms outperform all known alternatives, and suggest that both adaptation and the combination of fuzziness with tree-style synchronization will be of increasing importance on future generations of shared-memory multiprocessors.

Tài liệu tham khảo

R. Gupta and C. R. Hill, A Scalable Implementation of Barrier Synchronization Using an Adaptive Combining Tree,IJPP 18(3):161–180 (June 1989). R. Gupta, The Fuzzy Barrier: A Mechanism for High Speed Synchronization of Processors,Proc. of the Third Int. Conf. on Archit. Support for Progr. Lang. and Oper. Syst., pp. 54–63 (April 1989). D. Hensgen, R. Finkel, and U. Manber, Two Algorithms for Barrier Synchronization.IJPP 17(1):1–17 (1988). B. Lubachevsky, Synchronization Barrier and Related Tools for Shared Memory Parallel Programming,Proc. of the Int. Conf. on Parallel Processing II, pp. 175–179 (August 1989). J. M. Mellor-Crummery and M. L. Scott, Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors,ACM Trans. on Computer Systems 9(1):21–65 (February 1991). P.-C Yew, N.-F. Tzeng, and D. H. Lawrie, Distributing Hot-Spot Addressing in Large-Scale Multiprocessors,IEEE Trans. on Computers C-36(4):388–395 (April 1987). E. D. Brooks III, The Butterfly Barrier,IJPP 15(4):295–307 (1986). T. E. Anderson, The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors,IEEE Trans. on Parallel and Distributed Systems 1:1:6–16 (January 1990). G. Graunke and S. Thakkar, “Synchronization Algorithms for Shared-Memory Multiprocessors,Computer 23(6):60–69 (June 1990). W. C. Hsieh and W. E. Weihl, Scalable Reader-Writer Locks for Parallel Systems, MIT/LCS/TR-521, Laboratory for Computer Science, MIT (November 1991). A. C. Lee, Barrier Synchronization over Multistage Interconnection Networks,Proc. of the Second IEEE Symp. on Parallel and Distributed Processing, pp. 130–133 (December 1990). J. M. Mellor-Crummery and M. L. Scott, Synchronization Without Contention,Proc. of the Fourth Int. Conf. on Archit. Support for Progr. Lang. and Oper. Syst., pp. 269–278 (April 1991). J. M. Mellor-Crummey and M. L. Scott, Scalable Reader-Writer Synchronization for Shared-Memory Multiprocessors,Proc. of the Third ACM Symp. on Principles and Practice of Parallel Programming, pp. 106–113 (April 1991). M. Herlihy, Wait-Free Synchronization,ACM Transactions on Programming Languages and Systems 13(1):124–149 (January 1991). T. Johnson and D. Shasha, A Framework for the Performance Analysis of Concurrent B-tree Algorithms,Proc. of the Ninth ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems, pp. 273–287 (April 1990). P. L. Lehman and S. B. Yao, Efficient Locking for Concurrent Operations on B-Trees,ACM Trans, on Database Systems 6(4):650–670 (December 1981). Y. Sagiv, Concurrent Operations on B*-Trees with Overtaking,J. of Comp. and Syst. Sci. 33(2):275–296 (October 1986). E. P. Markatos and T. J. LeBlanc, Shared-Memory Multiprocessor Trends and the Implications for Parallel Program Performance, TR 420, Computer Science Department, University of Rochester (May 1992). T. H. Dunigan, Kendall Square Multiprocessor: Early Experiences and Performance, ORNL/TM-12065, Oak Ridge National Laboratory (May 1992).