Reclaiming space from duplicate files in a serverless distributed file system

J.R. Douceur1, A. Adya1, W.J. Bolosky1, P. Simon1, M. Theimer1
1Microsoft Research Limited, USA

Tóm tắt

The Farsite distributed file system provides availability by replicating each file onto multiple desktop computers. Since this replication consumes significant storage space, it is important to reclaim used space where possible. Measurement of over 500 desktop file systems shows that nearly half of all consumed space is occupied by duplicate files. We present a mechanism to reclaim space from this incidental duplication to make it available for controlled file replication. Our mechanism includes: (1) convergent encryption, which enables duplicate files to be coalesced into the space of a single file, even if the files are encrypted with different users' keys; and (2) SALAD, a Self-Arranging Lossy Associative Database for aggregating file content and location information in a decentralized, scalable, fault-tolerant manner. Large-scale simulation experiments show that the duplicate-file coalescing system is scalable, highly effective, and fault-tolerant.

Từ khóa

#File systems #Cryptography #Extraterrestrial measurements #Databases #Large-scale systems #File servers #Availability #Fault diagnosis #Secure storage #Distributed computing

Tài liệu tham khảo

wolf, 1989, The Placement Optimization Program: A Practical Solution to the Disk File Assignment Problem, ACM SIGMETRICS 89, 10.1145/75108.75373 waldman, 2000, Publius: A Robust, Tamper-Evident Censorship-Resistant Web Publishing System, 9th USENIX Security Symposium, 59 10.1109/MRD.1990.138237 saroiu, 2002, A Measurement Study of Peer-to-Peer File Sharing Systems, SPIE Multimedia Computing and Networking 2002 rowstron, 0, Pastry: Scalable, Distributed Object Location and Routing for Large-Scale Peer-to-Peer Systems, SIGCOMM 2001 submission 10.1145/964723.383072 10.1145/268998.266694 10.1145/964723.383071 song, 2000, Practical Techniques for Searches on Encrypted Data, IEEE Symposium on Security and Privacy, 44 solomon, 1998, Inside Windows NT cabri, 1996, Experience of Adaptive Replication in Distributed File Systems, 22nd EUROMICRO IEEE, 459 10.1145/510726.510755 castro, 1999, Practical Byzantine Fault Tolerance, 3rd OSDI USENIX, 173 clarke, 2000, Freenet: A Distributed Anonymous Information Storage and Retrieval System, Proc ICSI Workshop Design Issues in Anonymity and Unobservability 10.1145/301453.301480 10.1109/RELDIS.2001.969727 10.1109/ICDCS.2002.1022312 droms, 1997, Dynamic Host Configuration Protocol, RFC 2131 10.1145/75577.75583 10.1109/ADL.1998.670389 guy, 1990, Implementation of the Ficus Replicated File System, 1990 USENIX Conference Usenix, 63 10.1145/502034.502052 10.1145/168588.168596 menezes, 1997, Handbook of Applied Cryptography anderson, 1996, Two Practical and Provably Secure Block Ciphers: BEAR and LION, Proc 3rd Fast Software Encryption Workshop, 113, 10.1007/3-540-60865-6_48 10.1145/362686.362692 10.1145/258492.258523 biham, 1991, Differential Fault Analysis of Secret Key Cryptosystems, CRYPTO'91, 156 10.1145/339331.339345 bolosky, 2000, Single Instance Storage in Windows® 2000, Proc 4th Usenix Windows Systems Symp, 13 10.1145/224056.224066 brinkmann, 2000, Efficient, Distributed Data Placement Strategies for Storage Area Networks, 12th SPAA ACM alexander, 1997, DHCP Options and BOOTP Vendor Extensions, RFC 2132 10.1145/35037.35059 kelsey, 2000, Side Channel Cryptanalysis of Product Ciphers, Journal of Computer Security, 8, 141, 10.3233/JCS-2000-82-304 jain, 1991, The Art of Computer Systems Performance Analysis, John Wiley & Sons kocher, 1999, Differential Power Analysis, Journal of Cryptology'99, 388, 10.1007/3-540-48405-1_25 kocher, 1996, Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS and Other Systems, Crypto, 104 10.1109/MRD.1992.242617 kure, 1988, Optimization of File Migration in Distributed Systems, Technical Report UCB/CSD 88/413 University of California at Berkeley