Distributed computing with load-managed active storage

R. Wickremesinghe1, J.S. Chase1, J.S. Vitter1
1Department of Computer Science, Duke University, Durham, NC, USA

Tóm tắt

One approach to high-performance processing of massive data sets is to incorporate computation into storage systems. Previous work has shown that this active storage model is effective for a variety of problems. This paper explores opportunities to use active storage as a basis for exploiting asymmetric parallelism in applications using a streaming computation model on collections of fixed-size records. This model is the basis for much of the research in I/O-efficient algorithms, which deals with an important class of massive data problems not studied in previous work on active storage. We present an extension of a streaming computation model for an external memory toolkit to support a flexible mapping of computations to storage-based processors. Our approach enables load-managed active storage: it exposes parallelism, ordering constraints, and primitive computation units to the system, which can configure the application to balance load and make the best use of available processing power Emulation results from a sorting application demonstrate the potential of dynamic adaptation in load-managed active storage.

Từ khóa

#Distributed computing #Parallel processing #Power system modeling #Computational modeling #Concurrent computing #Computer networks #Large-scale systems #Computer science #Emulation #Sorting

Tài liệu tham khảo

vengroff, 1995, TPIE User Manual and Reference uysal, 1999, Programming Model Algorithms and Performance Evaluation of Active Disks 10.1145/512161.512180 10.1145/342009.335439 zhang, 1999, HPVM MinuteSort White Paper 10.1145/502034.502057 vitter, 2001, Distribution sort with randomized cycling, Proceedings of the Twelfth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA-01), 77 10.1145/384192.384193 10.1109/HPCA.1998.650549 10.1109/CCGRID.2001.923176 chiang, 1995, External-memory graph algorithms, Symposium on Discrete Algorithms (SODA), 139 10.1145/291069.291029 gray, 2002, Storage bricks, Talk at Conference on File and Storage Technologies gray, 2001, In search of petabyte databases, Talk at Conference on High Performance Transaction Systems gribble, 2000, Scalable, distributed data structures for Internet service construction, Proc 4th Symp Operating Systems Design and Implementation (OSDI 00), 319 10.1016/S1389-1286(00)00179-1 griffin, 2002, Timing-accurate storage emulation, Proceedings of the Conference on File and Storage Technologies (FAST) keeton, 1999, Computer Architecture Support for Database Applications rivera-alvarez, 2000, Disk-to-disk parallel sorting on HPVM clusters running Windows NT amiri, 2000, Dynamic function placement for data-intensive cluster computing, Proceedings of the 2000 USENIX Annual Technical Conference (USENIX-00), 307 riedel, 1998, ctive storage for large-scale data mining and multimedia, Proc Twenty-Fourth Conf Very Large Databases, 62 10.1109/ICDCS.2000.840942 10.1145/253260.253322 10.1109/SSDM.1999.787622 amiri, 2000, Scalable and manageable storage systems arpaci-dusseau, 1999, Performance availability for networks of workstations 10.1145/281035.281048 10.1145/48529.48535 10.1145/301816.301823 10.1145/291069.291026 10.1145/290593.290602 lumb, 2002, Freeblock scheduling outside of disk firmware, Proceedings of the Conference on File and Storage Technologies (FAST) lumb, 2000, Towards higher disk head utilization: Extracting “free” bandwidth from busy disk drives, Proc 4th Symp Operating Systems Design and Implementation (OSDI 00), 87 10.1145/342009.335375 riedel, 1999, Active Disks - Remote Execution for Network-Attached Storage riedel, 1997, Active disks-remote execution for network-attached storage, Technical Report CMU-CS-97–198 10.1109/2.928624