A high-performance cluster storage server

K. Bell1, A. Chien1, M. Lauria2
1Department of Computer Science and Engineering, University of California, San Diego, USA
2Department of Computer and Information Science, Ohio State Uinversity, Columbus, OH, USA

Tóm tắt

An essential building block for any data grid infrastructure is the storage server. In this paper we describe a high-performance cluster storage server built around the SDSC Storage Resource Broker (SRB) and commodity workstations. A number of performance critical design issues and our solutions to them are described. We incorporate pipeline optimizations into SRB to enable the full overlapping of communication and disk I/O. With these optimizations we were able to deliver to the application more than 95% of the disk throughput achievable through a remote connection. Then we show how our approach to network-striped transport is effective in achieving aggregate cluster-to-cluster throughput which scales with the number of connections. Finally, we present a federated SRB service over MPI that allows fast TCP connections to stripe data across multiple server disks reaching 97% of the combined write capacity of multiple nodes.

Từ khóa

#Grid computing #Network servers #Web server #Throughput #Geophysics computing #Middleware #Aggregates #High performance computing #Bandwidth #Internet

Tài liệu tham khảo

0, The European DataGrid Project 0, The SDSC Storage Resource Broker Homepage 0, The Internet Backplane Protocol Homepage 10.1145/262391.262398 0, The General Parallel File System Homepage 10.1145/192593.192709 nallipogu, 2001, Increasing the Throughput of the SDSC Storage Resource Broker nallipogu, 0, Improving the Throughput of Remote Storage Access through Pipelining 1997, Message Passing Interface Forum, MPI-2 Extensions to the Message-Passing Interface riedel, 1998, A Performance study of sequential I/O on Windows NT 4, Proc 2nd Usenix Windows NT Symp, 1 berman, 2000, The protein data bank, 28, 235 krogh, 1994, Hidden Markov models in computational biology Applications to protein modeling, 235, 1501 stolorz, 1995, Fast Spatio-Temporal Data Mining from Large Geophysical Datasets, Proceedings of the First International Conference on Knowledge Discovery and Data Mining, 300 gunn, 1992, The Sloan Digital Sky Survey, Sky Surveys Protostarts to Protogalaxies, 267 10.1109/2.299409 stiles, 1998, Monte Carlo simulation of neuromuscular transmitter release using Mcell, a general simulator of cellular physiological processes, 279 altschul, 1997, Gapped BLAST and PSI-BLAST A new generation of protein database search programs, 25, 3899 benson, 2000, Genbank, 28, 15 foster, 1999, The Grid Blueprint for a New Computing Infrastructure 1984, TTCP A Test of TCP and UDP Performance 10.1145/301816.301839 10.1177/109434209701100205 0, The GridFTP Homepage 10.1145/377792.377829