The Architectural and Operating System Implications on the Performance of Synchronization on ccNUMA Multiprocessors
Tóm tắt
Từ khóa
Tài liệu tham khảo
D. Jiang and J. P. Singh, Scaling Application Performance on a Cache-Coherent Multi-processor, Proc. 26th Int'l. Symp. Computer Architecture (ISCA'99), Atlanta, Georgia, pp. 305-316 (May 1999).
Y. Solihin, V. Lahm, and J. Torrellas, Scal-Tool: Pinpointing and Quantifying Scalability Bottlenecks in DSM Multiprocessors, Proc. ACM-IEEE Supercomputing: High Performance Networking and Computing Conf. (SC'99), Portland, Oregon (November 1999).
T. Anderson, The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors, IEEE Trans. Parallel Distributed Syst., 1(1):6-16 (January 1990).
A. Gupta, A. Tucker, and S. Urushibara, The Impact of Operating System Scheduling Policies and Synchronization Methods on the Performance of Parallel Applications, Proc. ACM SIGMETRICS Conf. Measurement and Modeling of Computer Systems (SIGMETRICS'91), San Diego, California, pp. 120-132 (June 1991).
J. Mellor-Crummey and M. Scott, Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors, ACM Trans. Computer Syst., 9(1):21-65 (February 1991).
S. Kumar, D. Jiang, R. Chandra, and J. P. Singh, Evaluating Synchronization on Shared Address Space Multiprocessors: Methodology and Performance, Proc. ACM SIGMETRICS Conf. on Measurement and Modeling of Computer Systems (SIGMETRICS'99), Atlanta, Georgia, pp. 23-24 (May 1999).
D. Nikolopoulos and T. Papatheodorou, A Quantitative Architectural Evaluation of Synchronization Algorithms and Disciplines on ccNUMA Systems: The Case of the SGI Origin2000, Proc. 13th ACM Int'l. Conf. Supercomputing (ICS'99), Rhodes, Greece, pp. 319-328 (June 1999).
A. Kägi and J. Goodman, Efficient Synchronization: Let Them Eat QOLB, Proc. 24th Int'l. Symp. Computer Architecture (ISCA'97), Denver, Colorado, pp. 171-181 (June 1997).
C. Kuo, J. Carter, and R. Kuramkote, MP-LOCKs: Replacing H/W Synchronization Primitives with Message Passing, Proc. Fifth Int'l. Symp. High Performance Computer Architecture (HPCA-5), Orlando, Florida, pp. 284-288 (January 1999).
R. Rajwar, A. Ka-gi, and J. Goodman, Improving the Throughput of Synchronization by Insertion of Delays, Proc. Sixth Int'l. Symp. High Performance Computer Architecture (HPCA-6), Toulouse, France, pp. 156-165 (January 2000).
P. Trancoso and J. Torrellas, The Impact of Speeding Up Critical Sections with Data Prefetching and Forwarding, Proc. 1996 Int'l. Conf. Parallel Processing (ICPP'96), Bloomingdale, Illinois, pp. 79-86 (August 1996).
P. Diniz and M. Rinard, Eliminating Synchronization Overhead in Automatically Parallelized Programs Using Dynamic Feedback, ACM Trans. Comp. Syst., 17(2):89-132 (February 1999).
S. Fu and N. Tzeng, A Circular List-Based Mutual Exclusion Scheme for Large Shared-Memory Multiprocessors, IEEE Trans. Parallel Distributed Syst., 8(6):628-639 (June 1997).
T. Huang, Fast and Fair Mutual Exclusion for Shared Memory Systems, Proc. 19th IEEE Int'l. Conf. Distributed Computing Systems (ICDCS'99), Austin, Texas, pp. 224-231 (1999).
B. Lim and A. Agarwal, Reactive Synchronization Algorithms for Multiprocessors, Proc. Sixth Int'l. Conf. Architectural Support for Progr. Lang. Oper. Syst. (ASPLOS-VI), San Jose, California, pp. 25-35 (October 1994).
P. Reynolds, C. Williams, and R. Wagner, Isotach Networks, IEEE Trans. Parallel Distributed Syst., 8(4):337-348 (April 1997).
A. Karlin, K. Li, M. Manasse, and S. Owicki, Empirical Studies of Competitive Spinning for a Shared-Memory Multiprocessor, Proc. 13th ACM Symp. Oper. System Principles (SOSP'91), Pacific Grove, California, pp. 41-55 (October 1991).
L. Kontothanassis, R. Wisniewski, and M. Scott, Scheduler-Conscious Synchronization, ACM Trans. Computer Syst., 15(1):3-40 (February 1997).
B. H. Lim and A. Agarwal, Waiting Algorithms for Synchronization in Large-Scale Multi-processors, ACM Trans. Computer Syst., 11(3):253-294 (August 1993).
D. Feitelson and L. Rudolph, Gang Scheduling Performance Benefits for Fine-grain Synchronization, J. Parallel Distributed Computing, 16(4):306-318 (December 1992).
J. Ousterhout, Scheduling Techniques for Concurrent Systems, Proc. Third Int'l. Conf. Distributed Computing Systems (ICDCS'82), Miami, Florida, pp. 22-30 (October 1982).
N. Arora, R. Blumofe, and C. Greg-Plaxton, Thread Scheduling for Multiprogrammed Multiprocessors, Proc. Tenth ACM Symp. Parallel Algorithms and Architectures (SPAA'98), Puerto Vallarta, Mexico, pp. 119-129 (June 1998).
M. Herlihy, Wait-free Synchronization, ACM Trans. Progr. Lang. Syst., 13(1):124-149 (January 1991).
S. Lumetta and D. Culler, Managing Concurrent Access for Shared-Memory Active Messages, Proc. of First IEEE-ACM Joint Int'l. Parallel Processing Symp. Symp. Parallel and Distributed Processing (IPPS-SPDP'98), Orlando, Florida, pp. 272-276 (April 1998).
M. Michael and M. Scott, Nonblocking Algorithms and Preemption-Safe Locking on Multiprogrammed Shared-Memory Multiprocessors, J. Parallel Distributed Computing, 51(1):1-26 (May 1998).
S. Prakash, D. Lee, and T. Johnson, A Nonblocking Algorithm for Shared Queues Using Compare-and-Swap, IEEE Trans. Computers, 43(5):548-559 (May 1994).
J. Valois, Lock-Free Data Structures, Ph.D. thesis, Rensselaer Polytechnic Institute, Department of Computer Science (1995).
J. Laudon and D. Lenoski, The SGI Origin: A ccNUMA Highly Scalable Server, Proc. 24 th Int'l. Symp. Computer Architecture (ISCA'97), Denver, Colorado, pp. 241-251 (June 1997).
D. Nikolopoulos and T. Papatheodorou, Fast Synchronization on Scalable Cache-Coherent Multiprocessors using Hybrid Primitives, Proc. 14th IEEE-ACM Int'l. Parallel and Distributed Processing Symp. (IPDPS'2000), Cancun, Mexico, pp. 711-719 (May 2000).
D. Culler, J. P. Singh, and A. Gupta, Parallel Computer Architecture: A Hardware/Software Approach, Morgan Kaufman (1998).
L. Rudolph and Z. Segall, Dynamic Decentralized Cache Schemes for MIMD Parallel Processors, Proc. 11th Int'l. Symp. Computer Architecture (ISCA'84), Ann Arbor, Michigan, pp. 340-347 (June 1984).
M. Herlihy and J. Wing, Axioms for Concurrent Objects, Proc. 14th ACM Symp. Principles Progr. Lang. (POPL'87), Munich, Germany, pp. 13-26 (January 1987).
M. Michael and M. Scott, Simple, Fast and Practical Nonblocking and Blocking Concurrent Queue Algorithms, Proc. 15th ACM Symp. Principles of Distributed Computing (PODC'96), Philadelphia, Pennsylvania, pp. 267-276 (1996).
B. Marsh, M. Scott, T. LeBlanc, and E. Markatos, First-Class User-Level Threads, Proc. 13th ACM Symp. Oper. Syst. Principles (SOSP'91), Pacific Grove, California, pp. 110-121 (October 1991).
X. Martorell, J. Corbalan, D. Nikolopoulos, N. Navarro, E. Polyhchronopoulos, T. Papatheodorou, and J. Labarta, A Tool to Schedule Parallel Applications on Multi-processors: The NANOS CPU Manager, Proc. Sixth Workshop on Job Scheduling Strategies for Parallel Processing, in conjuction with IEEE IPDPS'2000, Lecture Notes in Computer Science, Vol. 1911, Cancun, Mexico, pp. 87-112 (May 2000).
Silicon Graphics Inc. IRIX 6.5 Man Pages. http://techpubs.sgi.com (November 1999).
S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta, The SPLASH-2 Programs: Characterization and Methodological Considerations, Proc. 22nd Int'l. Symp. Computer Architecture (ISCA'95), Santa Margherita Ligure, Italy, pp. 24-37 (June 1995).
Standard Performance Evaluation Corporation, SPEC CPU95 Documentation. http://www.spec.org (December 1999).
X. Martorell, E. Ayguadé, N. Navarro, J. Corbalan, M. Gonzalez, and J. Labarta, Thread Fork-Join Techniques for Multi-Level Parallelism Exploitation in NUMA Multiprocessors, Proc. 13th ACM Int'l. Conf. Supercomputing (ICS'99), Rhodes, Greece, pp. 294-301 (June 1999).
OpenMP Architecture Review Board, OpenMP Fortran Application Programming Interface, Version 1.1 (November 1999).
D. Craig, An Integrated Kernel and User-Level Paradigm for Efficient Multiprogramming. Technical Report CSRD No. 1533, University of Illinois at Urbana-Champaign (June 1999).
J. Torrellas, A. Tucker, and A. Gupta, Evaluating the Performance of Cache-Affinity Scheduling in Shared-Memory Multiprocessors, J. Parallel Distributed Computing, 24(2):139-151 (February 1995).
R. Arpaci, D. Culler, A. Krishnamurthy, S. Steinberg, and K. Yelick, Empirical Evaluation of the Cray T3D: A Compiler Perspective, Proc. 22nd Int'l. Symp. Computer Architecture (ISCA'95), St. Margherita Ligure, Italy, pp. 320-331 (June 1995).
S. Scott, Synchronization and Communication in the T3E Multiprocessor, Proc. Seventh Int'l. Conf. Architectural Support for Progr. Lang. Oper. Syst. (ASPLOS-VII), Cambridge, Massachusetts, pp. 26-36 (October 1996).