Predicting sporadic grid data transfers

S. Vazhkudai1,2, J.M. Schopf3,2
1Department of Computer and Information Sciences, University of Mississippi, USA
2Mathematics and Computer Science Division, Argonne National Laboratory, USA
3Computer Science Department, Northwestern University, USA

Tóm tắt

The increasingly common practice of replicating datasets and using resources as distributed data stores in grid environments has led to the problem of determining which replica can be accessed most efficiently. Due diverse performance characteristics and load variations of several components in the end-to-end path linking these various locations, selecting a replica from among many requires accurate prediction information of the data transfer times between the sources and sinks. In this paper we present a prediction system that is based on combining end-to-end application throughput observations and network load variations, capturing the whole-system performance and variations in load patterns, respectively. We develop a set of regression models to derive predictions that characterize the effect of network load variations on file transfer times. We apply these techniques to the GridFTP data movement tool, part of the Globus Toolkit/spl trade/, and observe performance gains of up to 10% in prediction accuracy when compared with approaches based on past system behavior in isolation.

Từ khóa

#Load management #Predictive models #Computer science #Grid computing #Mathematics #Distributed computing #Joining processes #Throughput #Performance gain #Accuracy

Tài liệu tham khảo

zaki, 1996, Customized Dynamic Load Balancing for Network of Workstations, Proc IEEE HPDC '96 10.1023/A:1019025230054 tirumala, 2001, The TCP/UDP Bandwidth Measurement Tool thomasian, 1986, Analysis Queuing Network Models for Parallel Processing of Task Systems, IEEE Transactions on Computers C-35, 12 samar, 2001, Grid Data Management Pilot (GDMP): A Tool for Wide Area Replication, IASTED International Conference on Applied Informatics (AI2001) smith, 1998, Predicting Application Run Times Using Historical Information, Proceedings of the IPPS/SPDP '98 Workshop on Job Scheduling Strategies for Parallel Processing, 10.1007/BFb0053984 0 vazhkudai, 0, Predicting the Performance of Wide Area Data-Transfers, Proceedings of the 16th Int'l Parallel and Distributed Processing Symposium (IPDPS 2002) vazhkudai, 0, GridFTP Predictor Trace Data terekhov, 2000, Distributed Data Access and Resource Management in the D0 SAM System, Proc of the HPDC'00 2000 edwards, 1984, An Introduction to Linear Regression and Correlation 10.1109/HCW.1998.666541 faerman, 1999, Adaptive Performance Prediction for Distributed Data-Intensive Applications, Proceedings of the ACM/IEEE SC99 Conference on High Performance Networking and Computing 0 geisler, 1999, Performance Coupling: Case Studies for Measuring the Interactions of Kernels in Modern Applications, Proc SPEC Workshop on Performance Evaluation with Realistic Applications holtman, 2000, Object Level Replication for Physics, Proceedings of 4th Annual Globus Retreat hoschek, 2000, Data Management in an International Grid Project, 2000 International Workshop on Grid Computing (GRID 2000), 10.1007/3-540-44444-0_8 hafeez, 2000, A Data Grid Prototype for Distributed Data Production in CMS, 7th International Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT2000) jones, 0, The Public Netperf Homepage 0 10.1109/HPDC.2000.868631 10.1006/jnca.2000.0110 10.1109/IPPS.1998.669995 baru, 1998, The SDSC Storage Resource Broker, Proceedings of CASCON'98 10.1145/169627.169856 schopf, 1997, Structural Prediction Models for High Performance Distributed Applications, Proceedings of the Cluster Computing Conference (CCC '97) cole, 1989, Algorithmic Skeletons Structured Management of Parallel Computation 0, The European Data Grid Project crovella, 1999, Performance Prediction and Tuning of Parallel Programs allcock, 2001, High-Performance Remote Access to Climate Simulation Data: A Challenge Problem for Data Grid Technologies, Proceedings of Supercomputing (SC'01), 10.1145/582034.582080 10.1109/IPPS.1997.580894 adve, 1993, Analyzing the Behavior and Performance of Parallel Programs 10.1109/71.80155 0, NetLogger A Methodology for Monitoring and Analysis of Distributed Systems malon, 2001, Grid-enabled Data Access in the ATLAS Athena Framework, Proceedings of Computing and High Energy Physics 2001 (CHEP'01) Conference ostle, 1988, Statistics in Research newman, 0, The Particle Physics Data Grid 0, SARA The Synthetic Aperture Radar Atlas pankratz, 1991, Forecasting with Dynamic Regression Models, 10.1002/9781118150528