Want to scale in centralized systems? Think P2P
Tóm tắt
Peer-to-peer (P2P) systems have been widely researched over the past decade, leading to highly scalable implementations for a wide range of distributed services and applications. A P2P system assigns symmetric roles to machines, which can act both as client and server. This alleviates the need for any central component to maintain a global knowledge of the system. Instead, each peer takes individual decisions based on a local knowledge of the rest of the system, providing scalability by design. While P2P systems have been successfully applied to a wide range of distributed applications (multicast, routing, storage, pub-sub, video streaming), with some highly visible successes (Skype, Bitcoin), they tend to have fallen out of fashion in favor of a much more cloud-centric vision of the current Internet. We think this is paradoxical, as cloud-based systems are themselves large-scale, highly distributed infrastructures. They reside within massive, densely interconnected datacenters, and must execute efficiently on an increasing number of machines, while dealing with growing volumes of data. Today even more than a decade ago, large-scale systems require scalable designs to deliver efficient services. In this paper we argue that the local nature of P2P systems is key for scalability regardless whether a system is eventually deployed on a single multi-core machine, distributed within a data center, or fully decentralized across multiple autonomous hosts. Our claim is backed by the observation that some of the most scalable services in use today have been heavily influenced by abstractions and rationales introduced in the context of P2P systems. Looking to the future, we argue that future large-scale systems could greatly benefit from fully decentralized strategies inspired from P2P systems. We illustrate the P2P legacy through several examples related to Cloud Computing and Big Data, and provide general guidelines to design large-scale systems according to a P2P philosophy.
Tài liệu tham khảo
Rodrigues R, Druschel P (2010) Peer-to-peer systems. Commun ACM 53(10): 72–82. doi:10.1145/1831407.1831427.
Rowstron AIT, Druschel P (2001) Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems In: Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg. Middleware ’01, 329–350.. Springer, London, UK. http://dl.acm.org/citation.cfm?id=646591.697650.
Stoica I, Morris R, Karger D, Kaashoek MF, Balakrishnan H (2001) Chord: A scalable peer-to-peer lookup service for internet applications In: Proceedings of the 2001 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications. SIGCOMM ’01, 149–160.. ACM, New York, USA. doi:10.1145/383059.383071.
Huang Y, Chen YF, Jana R, Jiang H, Rabinovich M, Reibman A, Wei B, Xiao Z (2007) Capacity analysis of mediagrid: a p2p iptv platform for fiber to the node (fttn) networks. IEEE J Selected Areas Commun 25(1): 131–139.
Yin H, Liu X, Zhan T, Sekar V, Qiu F, Lin C, Zhang H, Li B (2010) LiveSky: Enhancing CDN with P2P. ACM Trans Multimedia Comput Commun Appl 6: 16–11619.
Kreitz G, Niemelä F (2010) Spotify – large scale, low latency, P2P music-on-demand streaming In: IEEE Tenth International Conference on Peer-to-Peer Computing, P2P 2010, Delft, The Netherlands, 25-27 August 2010.. IEEE.
Zhao M, Aditya P, Chen A, Lin Y, Haeberlen A, Druschel P, Maggs B, Wishon B, Ponec M (2013) Peer-assisted content distribution in Akamai NetSession In: Proc of the 2013 Internet Measurement Conference, IMC 2013, 31–42.. ACM, New York, USA.
Kaufman M (2013) Skype / NSA. http://www.listbox.com/member/archive/247/2013/06/entry/6:271/20130623090855. 0B714E0A-DC06-11E2-9F35-8CD4CCA160A2/. (e-mail, forwarded by Dave Farber to [email protected]), Accessed 2 June 2015.
Bai X, Guerraoui R, Kermarrec AM, Leroy V (2011) Collaborative personalized top-k processing. ACM Trans Database Syst 36(4): 38. doi:10.1145/2043652.2043659.
Dong W, Moses C, Li K (2011) Efficient k-nearest neighbor graph construction for generic similarity measures In: Proceedings of the 20th International Conference on World Wide Web. WWW ’11, 577–586.. ACM, New York, USA. doi:10.1145/1963405.1963487.
DeCandia G, Hastorun D, Jampani M, Kakulapati G, Lakshman A, Pilchin A, Sivasubramanian S, Vosshall P, Vogels W (2007) Dynamo: amazon’s highly available key-value store In: Proceedings of Twenty-first ACM SIGOPS Symposium on Operating Systems Principles. SOSP ’07, 205–220.. ACM, New York, USA. doi:10.1145/1294261.1294281.
Lakshman A, Malik P (2010) Cassandra: a decentralized structured storage system, 35–40.. ACM, New York, USA. doi:10.1145/1773912.1773922.
Ratnasamy S, Francis P, Handley M, Karp R, Shenker S (2001) A scalable content-addressable network In: Proceedings of the 2001 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications. SIGCOMM ’01, 161–172.. ACM, New York, USA. doi:10.1145/383059.383072.
Zhao BY, Kubiatowicz J, Joseph AD (2002) Tapestry: a fault-tolerant wide-area application infrastructure. Comput Commun Rev 32(1): 81.
Karger D, Lehman E, Leighton T, Panigrahy R, Levine M, Lewin D (1997) Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the world wide web In: Proceedings of the Twenty-ninth Annual ACM Symposium on Theory of Computing. STOC ’97, 654–663.. ACM, New York, USA. doi:10.1145/258533.258660.
Haeberlen A, Mislove A, Druschel P (2005) Glacier: highly durable, decentralized storage despite massive correlated failures In: Proceedings of the 2nd Conference on Symposium on Networked Systems Design & Implementation - Volume 2. NSDI’05, 143–158.. USENIX Association, Berkeley, CA, USA. http://dl.acm.org/citation.cfm?id=1251203.1251214.
Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, Gruber RE (2006) Bigtable: a distributed storage system for structured data In: Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7. OSDI ’06, 15–15.. USENIX Association, Berkeley, CA, USA. http://dl.acm.org/citation.cfm?id=1267308.1267323.
Bharambe AR, Agrawal M, Seshan S (2004) Mercury: supporting scalable multi-attribute range queries. ACM SIGCOMM Comput Commun Rev 34(4): 353–366.
Guerraoui R, Handurukande SB, Huguenin K, Kermarrec AM, Le Fessant F, Riviere E (2006) Gosskip, an efficient, fault-tolerant and self organizing overlay using gossip-based construction and skip-lists principles In: Proceedings of the Sixth IEEE International Conference on Peer-to-Peer Computing. P2P ’06, 12–22.. IEEE Computer Society, Washington, DC, USA. doi:10.1109/P2P.2006.19.
Vilaça R, Oliveira R, Pereira J (2011) A correlation-aware data placement strategy for key-value stores In: Proceedings of the 11th IFIP WG 6.1 International Conference on Distributed Applications and Interoperable Systems. DAIS’11, 214–227.. Springer, Berlin, Heidelberg. http://dl.acm.org/citation.cfm?id=2022090.2022107.
Demers A, Greene D, Houser C, Irish W, Larson J, Shenker S, Sturgis H, Swinehart D, Terry D (1987) Epidemic algorithms for replicated database maintenance In: Proc. of the 6th Annual ACM Symposium. on Principles of Distributed Computing (PODC 1987), 1–12.. ACM, New York, USA.
Herlihy MP, Wing JM (1990) Linearizability: a correctness condition for concurrent objects. ACM Trans Program Lang Syst 12(3): 463–492. doi:10.1145/78969.78972.
Lamport L (1979) How to make a multiprocessor computer that correctly executes multiprocess programs. Comput IEEE Trans 100(9): 690–691.
Corbett JC, Dean J, Epstein M, Fikes A, Frost C, Furman JJ, Ghemawat S, Gubarev A, Heiser C, Hochschild P, Hsieh W, Kanthak S, Kogan E, Li H, Lloyd A, Melnik S, Mwaura D, Nagle D, Quinlan S, Rao R, Rolig L, Saito Y, Szymaniak M, Taylor C, Wang R, Woodford D (2012) Spanner: Google’s globally-distributed database In: Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation. OSDI’12, 251–264.. USENIX Association, Berkeley, CA, USA. http://dl.acm.org/citation.cfm?id=2387880.2387905.
Glendenning L, Beschastnikh I, Krishnamurthy A, Anderson T (2011) Scalable consistency in scatter In: Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles. SOSP ’11, 15–28.. ACM, New York, USA. doi:10.1145/2043556.2043559.
Lamport L (2001) Paxos made simple. ACM Sigact News 32(4): 18–25.
Lamport L (1998) The part-time parliament. ACM Trans Comput Syst 16(2): 133–169. doi:10.1145/279227.279229.
Gray J, Lamport L (2006) Consensus on transaction commit. ACM Trans Database Syst (TODS) 31(1): 133–160.
Fischer MJ, Lynch NA, Paterson MS (1985) Impossibility of distributed consensus with one faulty process. J ACM 32(2): 374–382. doi:10.1145/3149.214121.
Gilbert S, Lynch N (2002) Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. SIGACT News 33(2): 51–59. doi:10.1145/564585.564601.
Ekstrand MD, Riedl JT, Konstan JA (2011) Collaborative Filtering Recommender Systems. Now Publishers Inc., Boston - Delft.
Voulgaris S, van Steen M (2013) Vicinity: A pinch of randomness brings out the structure In: Proc. of the ACM/IFIP/USENIX 14th Int. Conf. on Middleware. Middleware’13, 21–40.. Springer Verlag, New York, USA.
Bertier M, Frey D, Guerraoui R, Kermarrec AM, Leroy V (2010) The gossple anonymous social network In: Proc. of the ACM/IFIP/USENIX 11th Int. Conf. on Middleware. Middleware’10, 191–211.. Springer Verlag, New York, USA.
Bai X, Bertier M, Guerraoui R, Kermarrec AM, Leroy V (2010) Gossiping personalized queries In: Proc. of the 13th Int. Conf. on Extending Database Technology. EDBT’10, 87–98.. ACM, New York, USA.
Voulgaris S, van Steen M, Iwanicki K (2007) Proactive gossip-based management of semantic overlay networks. Concurr Comput Pract Experience 19(17): 2299–2311.
Das AS, Datar M, Garg A, Rajaram S (2007) Google news personalization: scalable online collaborative filtering In: Proc of the 16th International Conference on World Wide Web. WWW’07, 271–280.. ACM, New York, USA.
Dean J, Ghemawat S (2010) Mapreduce: a flexible data processing tool. Commun ACM 53(1): 72–77.
Linden G, Smith B, York J (2003) Amazon.com recommendations: item-to-item collaborative filtering. IEEE Internet Computing 7(1): 76–80.
Meisner D, Sadler CM, Barroso LA, Weber WD, Wenisch TF (2011) Power management of online data-intensive services. SIGARCH Comput Archit News 39(3): 319–330.
Jelasity M, Montresor A, Babaoglu Ö (2009) T-man: Gossip-based fast overlay topology construction. Comput Netw 53(13): 2321–2339.
Boutet A, Frey D, Guerraoui R, Jégou A, Kermarrec AM (2013) Whatsup: A decentralized instant news recommender In: Proc of the 27th International Symposium on Parallel and Distributed Processing. IPDPS 2013, 741–752.. IEEE.
Eugster P, Felber P, Le Fessant F (2007) The “art” of programming gossip-based systems. SIGOPS Oper Syst Rev 41: 37–42.
van Renesse R, Minsky Y, Hayden M (1998) A gossip-style failure detection service In: Proc. of the IFIP Int. Conf. on Distributed Systems Platforms and Open Distributed Processing. Middleware’98, 55–70.. Springer, London, UK.
Kermarrec AM, van Steen M (2007) Gossiping in distributed systems. SIGOPS Oper Syst Rev 41(5): 2–7.
Jelasity M, Voulgaris S, Guerraoui R, Kermarrec AM, van Steen M (2007) Gossip-based peer sampling. ACM Trans Comput Syst 25(3). doi:10.1145/1275517.1275520.
Voulgaris S, van Steen M (2005) Epidemic-style management of semantic overlays for content-based searching In: Proc of Euro-Par Parallel Processing, 1143–1152.. Springer Verlag, New york, USA.
Handurukande SB, Kermarrec AM, Le Fessant F, Massoulie L, Patarin S (2006) Peer Sharing Behaviour in the eDonkey Network, and Implications for the Design of Server-less File Sharing Systems In: Proc of the 1st ACM/SIGOPS/Eurosys European Conference on Computer Systems. EuroSys’06, 359–371.. ACM, New York, USA.
Voulgaris S, Kermarrec AM, Massoulié L, van Steen M (2004) Exploiting semantic proximity in peer-to-peer content searching In: Proc of the 10th IEEE International Workshop on Future Trends of Distributed Computing Systems. FTDCS 2004, 238–243.. IEEE.
Kermarrec AM, Leroy V, Trédan G (2011) Distributed social graph embedding In: Proc of the 20th ACM Conference on Information and Knowledge Management. CIKM 2011, 1209–1214.. ACM, New York, USA.
Baldoni R, Beraldi R, Quéma V, Querzoni L, Piergiovanni ST (2007) Tera: topic-based event routing for peer-to-peer architectures In: Proc of the 2007 Inaugural International Conference on Distributed Event-based Systems. DEBS 2007, 2–13.. ACM, New York, USA.
Tribler (2010). http://www.tribler.org. accessed 2 June 2015.
Miller BN, Konstan JA, Riedl J (2004) Pocketlens: Toward a personal recommender system. ACM Trans Inf Syst 22(3): 437–476.
Kermarrec AM, Leroy V, Moin A, Thraves C (2010) Application of random walks to decentralized recommender systems In: Proc of the 14th International Conference. OPODIS 2010, 48–63.. Springer.
Boutet A, Frey D, Kermarrec R. G. A. -M, Patra R (2014) Hyrec: Leveraging browsers for scalable recommenders In: Proc. of the ACM/IFIP/USENIX Int. Conf. on Middleware. Middleware’14, 85–96.. Springer Verlag, New York, USA.
Pujol JM, Erramilli V, Siganos G, Yang X, Laoutaris N, Chhabra P, Rodriguez P (2012) The little engine(s) that could: Scaling online social networks. IEEE/ACM Trans Netw 20(4): 1162–1175.
Costa P, Donnelly A, Rowstron AIT, O’Shea G (2012) Camdoop: Exploiting in-network aggregation for big data applications In: Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2012, San Jose, CA, USA, April 25-27, 2012, 29–42.. ACM, New York, USA.
Chowdhury M, Zaharia M, Ma J, Jordan MI, Stoica IManaging data transfers in computer clusters with orchestra In: Proc of the ACM SIGCOMM 2011 Conference, 98–109.. ACM, New York, USA.
Zaharia M, Borthakur D, Sen Sarma J, Elmeleegy K, Shenker S, Stoica IDelay scheduling: A simple technique for achieving locality and fairness in cluster scheduling In: Proc of the 5th ACM/SIGOPS/Eurosys European Conference on Computer Systems. EuroSys ’10, 265–278.. ACM, New York, USA.
Taiani F, Lin S, Blair GS (2014) Gossipkit: A unified componentframework for gossip. Softw Eng IEEE Trans 40(2): 123–136. doi:10.1109/TSE.2013.50.
Babaoglu O, Canright G, Deutsch A, Caro GAD, Ducatelle F, Gambardella LM, Ganguly N, Jelasity M, Montemanni R, Montresor A, Urnes T (2006) Design patterns from biology for distributed computing. ACM Trans Auton Adapt Syst 1: 26–66.
Common Object Request Broker Architecture (CORBA). http://www.omg.org/spec/CORBA/. accessed 2 June 2015.
Seinturier L, Merle P, Rouvoy R, Romero D, Schiavoni V, Stefani JB (2012) A component-based middleware platform for reconfigurable service-oriented architectures. Softw Pract Experience 42(5): 559–583. doi:10.1002/spe.1077.
Hiltunen MA, Schlichting RD (2000) The cactus approach to building configurable middleware In: Proc of the workshop on Dependable System Middleware and Group Communication (DSMGC 2000) - NO PUBLISHER KNOWN.
van Renesse R, Birman K, Hayden M, Vaysburd A, Karr D (1998) Building adaptive systems using ensemble. Softw Pract Experience 28(9): 963–979.
Bhatti NT, Hiltunen MA, Schlichting RD, Chiu W (1998) Coyote: a system for constructing fine-grain configurable communication services. ACM Trans Comput Syst 16: 321–366.
Colyer A, Clement A (2004) Large-scale aosd for middleware In: Proc. of the 3rd International Conference on Aspect-oriented Software Development, AOSD’04, 56–65.. ACM, New York, USA.
Fleury M, Reverbel F (2003) The JBoss extensible server In: ACM/IFIP/USENIX Int. Middleware Conf. (Middleware’03), 344–373.
Fleurey F, Dehlen V, Bencomo N, Morin B, Jézéquel JM (2009) Modeling and validating dynamic adaptation In: Models in Software Engineering (MODELS’10). LNCS, 97–108.. Springer Verlag, New York, USA.
Morin B, Barais O, Nain G, Jezequel JM (2009) Taming dynamically adaptive systems using models and aspects In: Proc. of the 31st Int. Conf. on Soft. Engineering. ICSE ’09, 122–132.. IEEE Computer Society, Washington, DC, USA.
Killian CE, Anderson JW, Braud R, Jhala R, Vahdat A (2009) Building distributed systems using mace In: Proc of the 9th International Conference on Peer-to-Peer Computing, 91–92. doi:10.1109/P2P.2009.5284502.
Rodriguez A, Killian CE, Bhat S, Kostic D, Vahdat A (2004) MACEDON: methodology for automatically creating, evaluating, and designing overlay networks In: Proc of the 1st Symposium on Networked Systems Design and Implementation. NSDI 2004, 267–280.. USENIX. http://www.usenix.org/events/nsdi04/tech/rodriguez.html.
Behnel S, Buchmann A (2007) Models and languages for overlay networks In: Databases, Information Systems, and Peer-to-Peer Computing. Lecture Notes in Computer Science, 211–218.. Springer. doi:10.1007/978-3-540-71661-7_21.
Rivière E, Baldoni R, Li H, Pereira J (2007) Compositional gossip: A conceptual architecture for designing gossip-based applications. ACM SIGOPS Oper Syst Rev 41(5): 43–50.
Princehouse L, Birman K (2010) Code-partitioning gossip. SIGOPS Oper Syst Rev 43(4): 40–44. doi:10.1145/1713254.1713264.
Lin S, Taïani F, Bertier M, Blair G, Kermarrec AM (2011) Transparent componentisation: high-level (re)configurable programming for evolving distributed systems In: Proceedings of the 2011 ACM Symposium on Applied Computing. SAC ’11, 203–208.. ACM, TaiChung, Taiwan. doi:10.1145/1982185.1982233.
(2014) DIONASYS: Declarative and Interoperable Overlay Networks, Applications to Systems of Systems. http://www.chistera.eu/projects/dionasys. accessed 2 June 2015.
Mainland G, Morrisett G, Welsh M (2008) Flask: staged functional programming for sensor networks In: ICFP ’08: Proceeding of the 13th ACM SIGPLAN International Conference on Functional Programming, 335–346.. ACM, New York, USA. doi:10.1145/1411204.1411251.
Newton R, Morrisett G, Welsh M (2007) The regiment macroprogramming system In: IPSN ’07: Proceedings of the 6th International Conference on Information Processing in Sensor Networks, 489–498.. ACM, New York, USA. doi:10.1145/1236360.1236422.
Loo BT, Condie T, Garofalakis M, Gay DE, Hellerstein JM, Maniatis P, Ramakrishnan R, Roscoe T, Stoica I (2009) Declarative networking. Commun ACM 52(11): 87–95. doi:10.1145/1592761.1592785.
Chu D, Popa L, Tavakoli A, Hellerstein JM, Levis P, Shenker S, Stoica I (2007) The design and implementation of a declarative sensor network system In: SenSys ’07: Proceedings of the 5th International Conference on Embedded Networked Sensor Systems, 175–188.. ACM, New York, USA. doi:10.1145/1322263.1322281.
Rubinfeld R, Shapira A (2011) Sublinear time algorithms. Electronic Colloquium Comput Complex (ECCC) 18: 13.
Agarwal S, Mozafari B, Panda A, Milner H, Madden S, Stoica I (2013) Blinkdb: queries with bounded errors and bounded response times on very large data In: Proc of the 8th ACM/SIGOPS/Eurosys European Conference on Computer Systems. EuroSys’13, 29–42.. ACM, New York, USA.