The Architectural and Operating System Implications on the Performance of Synchronization on ccNUMA Multiprocessors

International Journal of Computer & Information Sciences - 2001

Dimitrios S. Nikolopoulos¹, Theodore S. Papatheodorou²

¹Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, Urbana

²Department of Computer Engineering and Informatics, University of Patras, Patras, Greece

Tóm tắt

This paper investigates the performance of synchronization algorithms on ccNUMA multiprocessors, from the perspectives of the architecture and the operating system. In contrast with previous related studies that emphasized the relative performance of synchronization algorithms, this paper takes a new approach by analyzing the sources of synchronization latency on ccNUMA architectures and how can this latency be reduced by leveraging hardware and software schemes in both dedicated and multiprogrammed execution environments. From the architectural perspective, the paper identifies the implications of directory-based cache coherence on the latency and scalability of synchronization instructions and examines if and how can simple hardware that accelerates these instructions be leveraged to reduce synchronization latency. From the operating system's perspective, the paper evaluates in a unified framework, user-level, kernel-level and hybrid algorithms for implementing scalable synchronization in multiprogrammed execution environments. Along with visiting the aforementioned issues, the paper contributes a new methodology for implementing fast synchronization algorithms on ccNUMA multiprocessors. The relevant experiments are conducted on the SGI Origin2000, a popular commercial ccNUMA multiprocessor.

Từ khóa

Tài liệu tham khảo

D. Jiang and J. P. Singh, Scaling Application Performance on a Cache-Coherent Multi-processor, Proc. 26th Int'l. Symp. Computer Architecture (ISCA'99), Atlanta, Georgia, pp. 305-316 (May 1999).

Y. Solihin, V. Lahm, and J. Torrellas, Scal-Tool: Pinpointing and Quantifying Scalability Bottlenecks in DSM Multiprocessors, Proc. ACM-IEEE Supercomputing: High Performance Networking and Computing Conf. (SC'99), Portland, Oregon (November 1999).

T. Anderson, The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors, IEEE Trans. Parallel Distributed Syst., 1(1):6-16 (January 1990).

A. Gupta, A. Tucker, and S. Urushibara, The Impact of Operating System Scheduling Policies and Synchronization Methods on the Performance of Parallel Applications, Proc. ACM SIGMETRICS Conf. Measurement and Modeling of Computer Systems (SIGMETRICS'91), San Diego, California, pp. 120-132 (June 1991).

J. Mellor-Crummey and M. Scott, Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors, ACM Trans. Computer Syst., 9(1):21-65 (February 1991).

S. Kumar, D. Jiang, R. Chandra, and J. P. Singh, Evaluating Synchronization on Shared Address Space Multiprocessors: Methodology and Performance, Proc. ACM SIGMETRICS Conf. on Measurement and Modeling of Computer Systems (SIGMETRICS'99), Atlanta, Georgia, pp. 23-24 (May 1999).

D. Nikolopoulos and T. Papatheodorou, A Quantitative Architectural Evaluation of Synchronization Algorithms and Disciplines on ccNUMA Systems: The Case of the SGI Origin2000, Proc. 13th ACM Int'l. Conf. Supercomputing (ICS'99), Rhodes, Greece, pp. 319-328 (June 1999).

A. Kägi and J. Goodman, Efficient Synchronization: Let Them Eat QOLB, Proc. 24th Int'l. Symp. Computer Architecture (ISCA'97), Denver, Colorado, pp. 171-181 (June 1997).

C. Kuo, J. Carter, and R. Kuramkote, MP-LOCKs: Replacing H/W Synchronization Primitives with Message Passing, Proc. Fifth Int'l. Symp. High Performance Computer Architecture (HPCA-5), Orlando, Florida, pp. 284-288 (January 1999).

R. Rajwar, A. Ka-gi, and J. Goodman, Improving the Throughput of Synchronization by Insertion of Delays, Proc. Sixth Int'l. Symp. High Performance Computer Architecture (HPCA-6), Toulouse, France, pp. 156-165 (January 2000).

P. Trancoso and J. Torrellas, The Impact of Speeding Up Critical Sections with Data Prefetching and Forwarding, Proc. 1996 Int'l. Conf. Parallel Processing (ICPP'96), Bloomingdale, Illinois, pp. 79-86 (August 1996).

P. Diniz and M. Rinard, Eliminating Synchronization Overhead in Automatically Parallelized Programs Using Dynamic Feedback, ACM Trans. Comp. Syst., 17(2):89-132 (February 1999).

S. Fu and N. Tzeng, A Circular List-Based Mutual Exclusion Scheme for Large Shared-Memory Multiprocessors, IEEE Trans. Parallel Distributed Syst., 8(6):628-639 (June 1997).

T. Huang, Fast and Fair Mutual Exclusion for Shared Memory Systems, Proc. 19th IEEE Int'l. Conf. Distributed Computing Systems (ICDCS'99), Austin, Texas, pp. 224-231 (1999).

B. Lim and A. Agarwal, Reactive Synchronization Algorithms for Multiprocessors, Proc. Sixth Int'l. Conf. Architectural Support for Progr. Lang. Oper. Syst. (ASPLOS-VI), San Jose, California, pp. 25-35 (October 1994).

P. Reynolds, C. Williams, and R. Wagner, Isotach Networks, IEEE Trans. Parallel Distributed Syst., 8(4):337-348 (April 1997).

A. Karlin, K. Li, M. Manasse, and S. Owicki, Empirical Studies of Competitive Spinning for a Shared-Memory Multiprocessor, Proc. 13th ACM Symp. Oper. System Principles (SOSP'91), Pacific Grove, California, pp. 41-55 (October 1991).

L. Kontothanassis, R. Wisniewski, and M. Scott, Scheduler-Conscious Synchronization, ACM Trans. Computer Syst., 15(1):3-40 (February 1997).

B. H. Lim and A. Agarwal, Waiting Algorithms for Synchronization in Large-Scale Multi-processors, ACM Trans. Computer Syst., 11(3):253-294 (August 1993).

D. Feitelson and L. Rudolph, Gang Scheduling Performance Benefits for Fine-grain Synchronization, J. Parallel Distributed Computing, 16(4):306-318 (December 1992).

J. Ousterhout, Scheduling Techniques for Concurrent Systems, Proc. Third Int'l. Conf. Distributed Computing Systems (ICDCS'82), Miami, Florida, pp. 22-30 (October 1982).

N. Arora, R. Blumofe, and C. Greg-Plaxton, Thread Scheduling for Multiprogrammed Multiprocessors, Proc. Tenth ACM Symp. Parallel Algorithms and Architectures (SPAA'98), Puerto Vallarta, Mexico, pp. 119-129 (June 1998).

M. Herlihy, Wait-free Synchronization, ACM Trans. Progr. Lang. Syst., 13(1):124-149 (January 1991).

S. Lumetta and D. Culler, Managing Concurrent Access for Shared-Memory Active Messages, Proc. of First IEEE-ACM Joint Int'l. Parallel Processing Symp. Symp. Parallel and Distributed Processing (IPPS-SPDP'98), Orlando, Florida, pp. 272-276 (April 1998).

M. Michael and M. Scott, Nonblocking Algorithms and Preemption-Safe Locking on Multiprogrammed Shared-Memory Multiprocessors, J. Parallel Distributed Computing, 51(1):1-26 (May 1998).

S. Prakash, D. Lee, and T. Johnson, A Nonblocking Algorithm for Shared Queues Using Compare-and-Swap, IEEE Trans. Computers, 43(5):548-559 (May 1994).

J. Valois, Lock-Free Data Structures, Ph.D. thesis, Rensselaer Polytechnic Institute, Department of Computer Science (1995).

J. Laudon and D. Lenoski, The SGI Origin: A ccNUMA Highly Scalable Server, Proc. 24 th Int'l. Symp. Computer Architecture (ISCA'97), Denver, Colorado, pp. 241-251 (June 1997).

D. Nikolopoulos and T. Papatheodorou, Fast Synchronization on Scalable Cache-Coherent Multiprocessors using Hybrid Primitives, Proc. 14th IEEE-ACM Int'l. Parallel and Distributed Processing Symp. (IPDPS'2000), Cancun, Mexico, pp. 711-719 (May 2000).

D. Culler, J. P. Singh, and A. Gupta, Parallel Computer Architecture: A Hardware/Software Approach, Morgan Kaufman (1998).

L. Rudolph and Z. Segall, Dynamic Decentralized Cache Schemes for MIMD Parallel Processors, Proc. 11th Int'l. Symp. Computer Architecture (ISCA'84), Ann Arbor, Michigan, pp. 340-347 (June 1984).

M. Herlihy and J. Wing, Axioms for Concurrent Objects, Proc. 14th ACM Symp. Principles Progr. Lang. (POPL'87), Munich, Germany, pp. 13-26 (January 1987).

M. Michael and M. Scott, Simple, Fast and Practical Nonblocking and Blocking Concurrent Queue Algorithms, Proc. 15th ACM Symp. Principles of Distributed Computing (PODC'96), Philadelphia, Pennsylvania, pp. 267-276 (1996).

B. Marsh, M. Scott, T. LeBlanc, and E. Markatos, First-Class User-Level Threads, Proc. 13th ACM Symp. Oper. Syst. Principles (SOSP'91), Pacific Grove, California, pp. 110-121 (October 1991).

X. Martorell, J. Corbalan, D. Nikolopoulos, N. Navarro, E. Polyhchronopoulos, T. Papatheodorou, and J. Labarta, A Tool to Schedule Parallel Applications on Multi-processors: The NANOS CPU Manager, Proc. Sixth Workshop on Job Scheduling Strategies for Parallel Processing, in conjuction with IEEE IPDPS'2000, Lecture Notes in Computer Science, Vol. 1911, Cancun, Mexico, pp. 87-112 (May 2000).

Silicon Graphics Inc. IRIX 6.5 Man Pages. http://techpubs.sgi.com (November 1999).

S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta, The SPLASH-2 Programs: Characterization and Methodological Considerations, Proc. 22nd Int'l. Symp. Computer Architecture (ISCA'95), Santa Margherita Ligure, Italy, pp. 24-37 (June 1995).

Standard Performance Evaluation Corporation, SPEC CPU95 Documentation. http://www.spec.org (December 1999).

X. Martorell, E. Ayguadé, N. Navarro, J. Corbalan, M. Gonzalez, and J. Labarta, Thread Fork-Join Techniques for Multi-Level Parallelism Exploitation in NUMA Multiprocessors, Proc. 13th ACM Int'l. Conf. Supercomputing (ICS'99), Rhodes, Greece, pp. 294-301 (June 1999).

OpenMP Architecture Review Board, OpenMP Fortran Application Programming Interface, Version 1.1 (November 1999).

D. Craig, An Integrated Kernel and User-Level Paradigm for Efficient Multiprogramming. Technical Report CSRD No. 1533, University of Illinois at Urbana-Champaign (June 1999).

J. Torrellas, A. Tucker, and A. Gupta, Evaluating the Performance of Cache-Affinity Scheduling in Shared-Memory Multiprocessors, J. Parallel Distributed Computing, 24(2):139-151 (February 1995).

R. Arpaci, D. Culler, A. Krishnamurthy, S. Steinberg, and K. Yelick, Empirical Evaluation of the Cray T3D: A Compiler Perspective, Proc. 22nd Int'l. Symp. Computer Architecture (ISCA'95), St. Margherita Ligure, Italy, pp. 320-331 (June 1995).

S. Scott, Synchronization and Communication in the T3E Multiprocessor, Proc. Seventh Int'l. Conf. Architectural Support for Progr. Lang. Oper. Syst. (ASPLOS-VII), Cambridge, Massachusetts, pp. 26-36 (October 1996).

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA