Decoupling computation and data scheduling in distributed data-intensive applications

K. Ranganathan1, I. Foster1,2
1Department of Computer Science, University of Chicago, Chicago, IL, USA
2Math and Computer Science Division, Argonne National Laboratory, Argonne, IL, USA

Tóm tắt

In high-energy physics, bioinformatics, and other disciplines, we encounter applications involving numerous, loosely coupled jobs that both access and generate large data sets. So-called Data Grids seek to harness geographically distributed resources for such large-scale data-intensive problems. Yet effective scheduling in such environments is challenging, due to a need to address a variety of metrics and constraints while dealing with multiple, potentially independent sources of jobs and a large number of storage, compute, and network resources. We describe a scheduling framework that addresses these problems. Within this framework, data movement operations may be either tightly bound to job scheduling decisions or, alternatively, performed by a decoupled, asynchronous process on the basis of observed data access patterns and load. We develop a family of algorithms and use simulation studies to evaluate various combinations. Our results suggest that while it is necessary to consider the impact of replication, it is not always necessary to couple data movement and computation scheduling. Instead, these two activities can be addressed separately, thus significantly simplifying the design and implementation.

Từ khóa

#Distributed computing #Processor scheduling #Scheduling algorithm #Application software #Computer science #Large-scale systems #Resource management #Physics computing #Laboratories #Bioinformatics

Tài liệu tham khảo

berman, 1996, Application-Level Scheduling on Distributed Heterogeneous Networks, Supercomputing'96, 10.1145/369028.369109 10.1109/SPDP.1995.530703 10.1145/301816.301839 braun, 1998, A Taxonomy of scheduling in general-purpose distributed computing systems, Workshop on Advances in Parallel and Distributed Systems (APADS) 10.1155/2000/319291 10.1109/HPDC.2001.945188 fan, 1998, Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol, Proceedings of ACM SIGCOMM'98, 10.1145/285237.285287 10.1177/109434209701100205 foster, 1999, The Grid Blueprint for a New Computing Infrastructure 10.1177/109434200101500302 wolski, 1997, Forecasting Network Performance to Support Dynamic Scheduling Using the Network Weather Service, Proc 6th IEEE Symp on High Performance Distributed Computing 0, Proceedings of Job Scheduling Strategies for Parallel Processing Workshop 10.1145/319151.319153 0, PARSEC Parallel Simulation Environment for Complex Systems avery, 2001, The GriPhyN Project Towards Petascale Virtual Data Grids 10.1016/S0169-7552(98)00015-4 10.1109/HCW.1999.765123 basney, 2000, Harnessing the Capacity of Computational Grids for High Energy Physics, Computing in High Energy and Nuclear Physics avery, 2001, An International Virtual-Data Grid Laboratory for Data Intensive Science 0, Fermi National Accelerator Laboratory 10.1016/S0010-4655(01)00276-4 0, CMS-The Compact Muon Solenoid hamscher, 2000, Evaluation of Job-Scheduling Strategies for Grid Computing, 7th International Conference of High Performance Computing 10.1109/HCW.1999.765094 holtman, 2001, CMS Requirements for the Grid, CHEP 10.1109/SPDP.1990.143505 ranganathan, 2001, Identifying Dynamic Replication Strategies for a High Performance Data Grid, International Workshop on Grid Computing thain, 2001, Gathering at the Well: Creating Communities for Grid I/O, Supercomputing 10.1109/HPDC.2001.945200