Distributed computing connected components with linear communication cost

Springer Science and Business Media LLC - Tập 36 - Trang 555-592 - 2018

Xing Feng¹, Lijun Chang², Xuemin Lin³, Lu Qin¹, Wenjie Zhang³, Long Yuan⁴

¹University of Technology Sydney, Sydney, Australia

²University of Sydney, Sydney, Australia

³University of New South Wales, Sydney, Australia

⁴East China Normal University, Shanghai, China

Tóm tắt

The paper studies three fundamental problems in graph analytics, computing connected components (CCs), biconnected components (BCCs), and 2-edge-connected components (ECCs) of a graph. With the recent advent of big data, developing efficient distributed algorithms for computing CCs, BCCs and ECCs of a big graph has received increasing interests. As with the existing research efforts, we focus on the Pregel programming model, while the techniques may be extended to other programming models including MapReduce and Spark. The state-of-the-art techniques for computing CCs and BCCs in Pregel incur $$O(m\times \#\text {supersteps})$$ total costs for both data communication and computation, where m is the number of edges in a graph and #supersteps is the number of supersteps. Since the network communication speed is usually much slower than the computation speed, communication costs are the dominant costs of the total running time in the existing techniques. In this paper, we propose a new paradigm based on graph decomposition to compute CCs and BCCs with O(m) total communication cost. The total computation costs of our techniques are also smaller than that of the existing techniques in practice, though theoretically almost the same. Moreover, we also study distributed computing ECCs. We are the first to study this problem and an approach with O(m) total communication cost is proposed. Comprehensive empirical studies demonstrate that our approaches can outperform the existing techniques by one order of magnitude regarding the total running time.

Tài liệu tham khảo

Awerbuch, B., Shiloach, Y.: New connectivity and MSF algorithms for shuffle-exchange network and PRAM. IEEE Trans. Comput. 10, 1258–1263 (1987) Ceccarello, M., Pietracaprina, A., Pucci, G., Upfal, E.: Space and time efficient parallel graph decomposition, clustering and diameter approximation. In: Proceedings of SPAA’15 (2015) Chang, L., Yu, J. X., Qin, L., Lin, X., Liu, C., Liang, W.: Efficiently computing k-edge connected components via graph decomposition. In: Proceedings of SIGMOD’13 (2013) Ching, A., Kunz, C.: Giraph: large-scale graph processing infrastructure on hadoop. Hadoop Summit (2011) Cohen, J.: Graph twiddling in a mapreduce world. Comput. Sci. Eng. 11, 29–41 (2009) Cormen, T.H., Stein, C., Rivest, R.L., Leiserson, C.E.: Introduction to Algorithms. McGraw-Hill Higher Education, New York (2001) Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. In: Proceedings of OSDI’04 (2004) Feng, X., Chang, L., Lin, X., Qin, L., Zhang, W.: Computing connected components with linear communication cost in pregel-like systems. In: Proceedings of ICDE’16 (2016) Fortunato, S.: Community detection in graphs. Phys. Rep. 486, 75–174 (2010) Gibbons, A.: Algorithmic Graph Theory. Cambridge University Press, Cambridge (1985) Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: Graphx: graph processing in a distributed dataflow framework. In: Proceedings of OSDI’14 (2014) Henry, N., Bezerianos, A., Fekete, J.: Improving the readability of clustered social networks using node duplication. IEEE Trans. Vis. Comput. Graph. 14, 1317–1324 (2008) Hopcroft, J.E., Tarjan, R.E.: Efficient algorithms for graph manipulation [H] (algorithm 447). Commun. ACM 16, 372–378 (1973) Kang, U., Tsourakakis, C.E., Faloutsos, C.: PEGASUS: a peta-scale graph mining system. In: Proceedings of ICDM’09 (2009) Kang, U., McGlohon, M., Akoglu, L., Faloutsos, C.: Patterns on the connected components of terabyte-scale graphs. In: Proceedings of ICDM’10 (2010) Kavitha, T., Liebchen, C., Mehlhorn, K., Michail, D., Rizzi, R., Ueckerdt, T., Zweig, K.A.: Cycle bases in graphs characterization, algorithms, complexity, and applications. Comput. Sci. Rev. 3, 199–243 (2009) Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Distributed graphlab: a framework for machine learning in the cloud. PVLDB 5, 716–727 (2012) Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of SIGMOD’10 (2010) Miller, G.L., Peng, R., Xu, S.C.: Parallel graph decompositions using random shifts. In: Proceedings of SPAA’13 (2013) Miller, G.L., Ramachandran, V.: Efficient parallel ear decomposition with applications. MSRI 135, 162 (1986) Munagala, K., Ranade, A.G.: I/o-complexity of graph algorithms. In: Proceedings of SODA’99 (1999) Nanda, S., Kotz, D.: Localized bridging centrality for distributed network analysis. In: Proceedings of ICCCN’08 (2008) Przulj, N., Wigle, D., Jurisica, I.: Functional topology in a network of protein interactions. Bioinformatics 20, 340–348 (2004) Qin, L., Yu, J.X., Chang, L., Cheng, H., Zhang, C., Lin, X.: Scalable big graph processing in mapreduce. In: Proceedings of SIGMOD’14 (2014) Ramachandran, V.: Parallel open ear decomposition with applications to graph biconnectivity and triconnectivity. Citeseer (1992) Rastogi, V., Machanavajjhala, A., Chitnis, L., Sarma, A.D.: Finding connected components in map-reduce in logarithmic rounds. In: Proceedings of ICDE’13 (2013) Salihoglu, S., Widom, J.: Optimizing graph algorithms on Pregel-like systems. PVLDB 7, 577–588 (2014) Sanders, P.: Random permutations on distributed, external and hierarchical memory. Inf. Process. Lett. 67, 305–309 (1998) Schmidt, J.M.: A simple test on 2-vertex- and 2-edge-connectivity. Inf. Process. Lett. 113, 241–244 (2013) Shiloach, Y., Vishkin, U.: An o(log n) parallel connectivity algorithm. J. Algorithms 3, 57–67 (1982) Slota, G.M., Madduri, K.: Simple parallel biconnectivity algorithms for multicore platforms. In: Proceedings of HiPC’14 (2014) Stanton, I.: Streaming balanced graph partitioning algorithms for random graphs. In: Proceedings of SODA’14 (2014) Tarjan, R.E., Vishkin, U.: An efficient parallel biconnectivity algorithm. SIAM J. Comput. 14, 862–874 (1985) Tian, Y., Balmin, A., Corsten, S.A., Tatikonda, S., McPherson, J.: From “think like a vertex” to “think like a graph”. PVLDB 7, 193–204 (2013) Turau, V.: Computing bridges, articulations, and 2-connected components in wireless sensor networks. In: Proceedings of ALGOSENSORS’06 (2006) Valiant, L.G.: A bridging model for parallel computation. Commun ACM 33, 103–111 (1990) Vega, D., Cerda-Alabern, L., Navarro, L., Meseguer, R.: Topology patterns of a community network: Guifi.net. In: Proceedings of WiMob’12 (2012) Yan, D., Cheng, J., Xing, K., Lu, Y., Ng, W., Bu, Y.: Pregel algorithms for graph connectivity problems with performance guarantees. PVLDB 7, 1821–1832 (2014) Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of NSDI’12 (2012)

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Công cụ kiểm tra chính tả và thể thức Viver

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA