Network performance-aware collective communication for clustered wide-area systems

Parallel Computing - Tập 27 Số 11 - Trang 1431-1456 - 2001
Thilo Kielmann1, Henri E. Bal1, Sergei Gorlatch2, Kees Verstoep1, Rutger F. H. Hofman1
1Division of Mathematics and Computer Science, Vrije Universiteit, De Boelelaan 1081a, 1081 HV Amsterdam, The Netherlands
2Department of Computer Science, Technical University of Berlin, Franklin str. 28/29, 10587 Berlin, Germany

Tóm tắt

Từ khóa


Tài liệu tham khảo

A. Alexandrov, M.F. Ionescu, K.E. Schauser, C. Scheiman, LogGP: incorporating long messages into the LogP model – one step closer towards a realistic model for parallel computation, in: Proceedings of the Symposium on Parallel Algorithms and Architectures (SPAA), Santa Barbara, CA, July 1995, pp. 95–105

Bal, 1998, Performance evaluation of the Orca shared object system, ACM Trans. Comput. Syst., 16, 1, 10.1145/273011.273014

M. Banikazemi, V. Moorthy, D. Panda, Efficient collective communication on heterogeneous networks of workstations, in: International Conference on Parallel Processing, Minneapolis, MN, August 1998, pp. 460–467

Bernaschi, 1998, Collective communication operations: experimental results vs. theory, Concurrency: Practice and Experience, 10, 359, 10.1002/(SICI)1096-9128(19980425)10:5<359::AID-CPE323>3.0.CO;2-7

Bhoedjang, 1998, User-Level network interface protocols, IEEE Comput., 31, 53, 10.1109/2.730737

Boden, 1995, Myrinet: a gigabit-per-second local area network, IEEE Micro, 15, 29, 10.1109/40.342015

Bruck, 1996, On the design and implementation of broadcast and global combine operations using the postal model, IEEE Trans. Parallel Distrib. Syst., 7, 256, 10.1109/71.491579

Catlett, 1992, Commun. ACM, 35, 44, 10.1145/129888.129890

D. Culler, R. Karp, D. Patterson, A. Sahay, K.E. Schauser, E. Santos, R. Subramonian, T. von Eicken, LogP: towards a realistic model of parallel computation, in: Proceedings of the Symposium on Principles and Practice of Parallel Programming (PPoPP), San Diego, CA, May 1993, pp. 1–12

Culler, 1996, Assessing fast network interfaces, IEEE Micro, 16, 35, 10.1109/40.482310

G.E. Fagg, K.S. London, J.J. Dongarra, MPI_Connect: managing heterogeneous MPI applications interoperation and process control, in: Proceedings of the 5th European PVM/MPI Users' Group Meeting, number 1497 in LNCS, Liverpool, UK, 1998, pp. 93–96

Foster, 1997, Globus: a metacomputing infrastructure toolkit, Int. J. Supercomput. Appl., 11, 115, 10.1177/109434209701100205

I. Foster, C. Kesselman (Eds.), The GRID: Blueprint for a New Computing Infrastructure, Morgan Kaufmann, Los Altos, MA, 1998

E. Gabriel, M. Resch, T. Beisel, R. Keller, Distributed computing in a heterogeneous computing environment, in: Proceedings of the 5th European PVM/MPI Users' Group Meeting number 1497 in LNCS, Liverpool, UK, 1998, pp. 180–187

W. George, J. Hagedorn, J. Devaney, Status report on the development of the interoperable MPI protocol, in: Proceedings of MPIDC'99, Message Passing Interface Developer's and User's Conference, Atlanta, GA, March 1999, pp. 7–13

M. Gołebiewski, R. Hempel, J.L. Träff, Algorithms for collective communication operations on SMP clusters, in: The 1999 Workshop on Cluster-Based Computing, held in conjunction with 13th ACM-SIGARCH International Conference on Supercomputing (ICS'99), 1999, pp. 11–15

S. Gorlatch, C. Wedler, C. Lengauer, Optimization rules for programming with collective operations, in: Proceedings of the 13th International Parallel Processing Symposium & 10th Symposium on Parallel and Distributed Processing (IPPS/SPDP'99), 1999, pp. 492–499

J.-P. Goux, S. Kulkarni, J. Linderoth, M. Yoder, An enabling framework for master–worker applications on the computatinal grid, in: Proceedings of the High Performance Distributed Computing (HPDC 2000), Pittsburgh, PA, August 2000, pp. 43–50

A.S. Grimshaw, W.A. Wulf, and the Legion team, The legion vision of a worldwide virtual computer, Commun. ACM 40 (1) (1997) 39–45

Gropp, 1996, A high-performance, portable implementation of the MPI message passing interface standard, Parallel Comput., 22, 789, 10.1016/0167-8191(96)00024-5

W.D. Gropp, E. Lusk, D. Swider, Improving the performance of MPI derived datatypes, in: Proceedings of MPIDC'99, Message Passing Interface Developer's and User's Conference, Atlanta, GA, March 1999, pp. 25–30

P. Husbands, J.C. Hoe, MPI-StarT: delivering network performance to numerical applications, in: Proceedings of SC'98, November 1998; Online at http://www.supercomp.org/sc98/proceedings/

G. Iannello, M. Lauria, S. Mercolino, Cross-platform analysis of fast messages for Myrinet, in: Proceedings of the Workshop on CANPC'98, Lecture Notes in Computer Science, vol. 1362, Las Vegas, Nevada, Springer, Berlin, January 1998, pp. 217–231

N.T. Karonis, B.R. de Supinski, I. Foster, W. Gropp, E. Lusk, J. Bresnahan, Exploiting hierarchy in parallel computer networks to optimize collective operation performance, in: Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS 2000), Cancun, Mexico, IEEE, New York, May 2000, pp. 377–384

R.M. Karp, A. Sahay, E.E. Santos, K.E. Schauser, Optimal broadcast and summation in the LogP model, in: Proceedings of the Symposium on Parallel Algorithms and Architectures (SPAA), Velen, Germany, June 1993, pp. 142–153

R. Kesavan, D.K. Panda, Optimal multicast with packetization and network interface support, in: Proceedings of the International Conference on Parallel Processing, IEEE, New York, August 1997, pp. 370–377

T. Kielmann, H.E. Bal, S. Gorlatch, Bandwidth-efficient collective communication for clustered wide area systems, in: Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS 2000), Cancun, Mexico, IEEE, New York, May 2000, pp. 492–499

T. Kielmann, H.E. Bal, K. Verstoep, Fast measurement of LogP parameters for message passing platforms, in: 4th Workshop on Runtime Systems for Parallel Programming (RTSPP), Lecture Notes in Computer Science, vol. 1800, Cancun, Mexico, Springer, Berlin, May 2000, pp. 1176–1183

T. Kielmann, R.F.H. Hofman, H.E. Bal, A. Plaat, R.A.F. Bhoedjang, MPI's reduction operations in clustered wide area systems, in: Proceedings of MPIDC'99, Message Passing Interface Developer's and User's Conference, Atlanta, GA, March 1999, pp. 43–52

T. Kielmann, R.F.H. Hofman, H.E. Bal, A. Plaat, R.A.F. Bhoedjang, MagPIe: MPI's collective communication operations for clustered wide area systems, in: Proceedings of the Symposium on Principles and Practice of Parallel Programming (PPoPP), Atlanta, GA, May 1999, pp. 131–140

B. Lowekamp, A. Beguelin, ECO: efficient collective operations for communication on heterogeneous networks, in: International Parallel Processing Symposium (IPPS), Honolulu, HI, 1996, pp. 399–405

Maillet, 1995, On efficiently implementing global time for performance evaluation on multiprocessor systems, J. Parallel Distrib. Comput., 28, 84, 10.1006/jpdc.1995.1090

J.-Y.L. Park, H.-A. Choi, N. Nupairoj, L.M. Ni, Construction of optimal multicast trees based on the parameterized communication model, in: Proceedings of the International Conference on Parallel Processing (ICPP), vol. I, 1996, pp. 180–187

V. Paxson, On calibrating measurements of packet transit times, in: Proceedings of SIGMETRICS'98/PERFORMANCE'98, Madison, Wisconsin, June 1998, pp. 11–21

Santos, 1999, Optimal and near-optimal algorithms for k-item broadcast, J. Parallel Distrib. Comput., 57, 121, 10.1006/jpdc.1999.1529

van de Geijn, 1994, On global combine operations, J. Parallel Distrib. Comput., 22, 324, 10.1006/jpdc.1994.1091

Watts, 1995, A pipelined broadcast for multidimensional meshes, Parallel Process. Lett., 5, 281, 10.1142/S0129626495000266

R. Wolski, Forecasting network performance to support dynamic scheduling using the network weather service, in: Proceedings of the High-Performance Distributed Computing (HPDC-6), Portland, OR, August 1997, pp. 316–325; the network weather service is at http://nws.npaci.edu/