Robust keyword search in large attributed graphs

Springer Science and Business Media LLC - Tập 23 - Trang 502-524 - 2020
Spencer Bryson1, Heidar Davoudi1, Lukasz Golab2, Mehdi Kargar3, Yuliya Lytvyn2, Piotr Mierzejewski4, Jaroslaw Szlichta1,4, Morteza Zihayat3,4
1Ontario Tech University, Oshawa, Canada
2University of Waterloo, Waterloo, Canada
3Ted Rogers School of Management, Ryerson University, Toronto, Canada
4IBM Centre for Advanced Studies, Markham, Canada

Tóm tắt

There is a growing need to explore attributed graphs such as social networks, expert networks, and biological networks. A well-known mechanism for non-technical users to explore such graphs is keyword search, which receives a set of query keywords and returns a connected subgraph that contains the keywords. However, existing approaches, such as methods based on shortest paths between nodes containing the query keywords, may generate weakly-connected answers as they ignore the structure of the whole graph. To address this issue, we formulate and solve the robust keyword search problem for attributed graphs to find strongly-connected answers. We prove that the problem is NP-hard and we propose a solution based on a random walk with restart (RWR). To improve the efficiency and scalability of RWR, we use Monte Carlo approximation and we also propose a distributed version, which we implement in Apache Spark. Finally, we provide experimental evidence of the efficiency and effectiveness of our approach on real-world graphs.

Tài liệu tham khảo

Anagnostopoulos, A., Becchetti, L., Castillo, C., Gionis, A., & Leonardi, S. (2012). Online team formation in social networks. In WWW ’12, pp. 839–848. Bahmani, B., Chowdhury, A., & Goel, A. (2010). Fast incremental and personalized pagerank. Proceedings of the VLDB Endowment, 4(3), 173–184. Bai, L., Liang, J., Du, H., & Guo, Y. (2018). A novel community detection algorithm based on simplification of complex networks. Knowledge-Based Systems, 143, 58–64. Balmin, A., Hristidis, V., & Papakonstantinou, Y. (2004). Objectrank: Authority-based keyword search in databases. In VLDB, pp. 564–575. Bhalotia, Gaurav, Hulgeri, Arvind, Nakhe, Arvind, Chakrabarti, Soumen, & Sudarshan, Shashank. (2002). Keyword searching and browsing in databases using banks. In Proceedings 18th international conference on data engineering, pp. 431–440. IEEE. Cavallari, S., Zheng, V., Cai, H., Chang, K., & Cambria, E. (2017). Learning community embedding with community detection and node embedding on graphs. In CIKM, pp. 337–386. Chakrabarti, S. (2007). Dynamic personalized pagerank in entity relation graphs. In WWW, pp. 571–580. Cui, W., Xiao, Y., Wang, H., & Wang, W. (2014). Local search of communities in large graphs. In SIGMOD, pp. 991–1002. Ding, B., Yu, J., Wang, S., Qin, L., Zhang, X., & Lin, X. (2007). Finding top-k min-cost connected trees in databases. In ICDE, pp. 836–845. Ding, B., Yu, J. X., Wang, S., Qin, L., Zhang, X., & Lin, X. (2007). Finding top-k min-cost connected trees in databases. In 2007 IEEE 23rd international conference on data engineering, pp. 836–845. IEEE. Duret, L., & Mouchiroud, D. (1999). Expression pattern and surprisingly, gene length shape codon usage in caenorhabditis, drosophila, and arabidopsis. Proceedings of the National Academy of Sciences, 13(96), 4482–4487. Duret, L., & Mouchiroud, D. (1999). Expression pattern and, surprisingly, gene length shape codon usage in caenorhabditis, drosophila, and arabidopsis. In Proceedings of the national academy of sciences of the United States of America, pp. 4482–4487. Fang, Y., Cheng, R., Luo, S., & Hu, J. (2016). Effective community search for large attributed graphs. PVLDB, 9(12), 1233–1244. Gonzalez, M., & Kann, M. (2012). Protein interactions and disease. PLoS Computational Biology, 8(12), 819–830. Grover, A., & Leskovec, J. (2016). node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 855–864. ACM. Hajiabadi, M., Zare, H., & Bobarshad, H. (2017). IEDC: An integrated approach for overlapping and non-overlapping community detection. Knowledge-Based Systems, 123, 188–199. Han, S., Zou, L., Yu, J.X., & Zhao, D. (2017). Keyword search on RDF graphs: A query graph assembly approach. In CIKM, pp. 227–236. He, H., Wang, H., Yang, J., & Yu, P. (2007). BLINKS: Ranked keyword searches on graphs. In SIGMOD, pp. 305–316. Huang, X., & Lakshmanan, L. V. S. (2017). Attribute-driven community search. PVLDB, 10(9), 949–960. Ingvarsson, P. (2007). Gene expression and protein length influence codon usage and rates of sequence evolution in populus tremula. Molecular Biology and Evolution, 24(3), 836–844. Ingvarsson, Pär K. (2007). Gene expression and protein length influence codon usage and rates of sequence evolution in populus tremula. Molecular Biology and Evolution, 24(3), 836–844. Jay, S. M. (2019). Protein silencing to stop a silent killer. Science Translational Medicine, 11(473), 529. Jeong, Y., Pan, Y., Rathore, S., Kim, B., & Park, J. H. (2019). A parallel team formation approach using crowd intelligence from social network. Computers in Human Behavior, 101, 429–434. Kacholia, V., Pandit, S., Chakrabarti, S., Sudarshan, S., Desai, R., & Karambelkar, H. (2005). Bidirectional expansion for keyword search on graph databases. In VLDB, pp. 505–516. Kargar, M., & An, A. (2011). Keyword search in graphs: Finding r-cliques. PVLDB, 10(4), 681–692. Kargar, M., An, A., & Yu, X. (2014). Efficient duplication free and minimal keyword search in graphs. TKDE, 26(7), 1657–1669. Kargar, M., Golab, L., & Szlichta, J. (2016). eGraphSearch: Effective keyword search in graphs. In CIKM, system demonstration, pp. 2461–2464. Kargar, M., Zihayat, M., & An, A. (2013). Finding affordable and collaborative teams from a network of experts. In SDM, pp. 587–595. Kasneci, G., Ramanath, M., & Sozio, M. (2009). STAR: Steiner-tree approximation in relationship graphs. In ICDE, pp. 868–879. Lao, N., & Cohen, W. (2010). Fast query execution for retrieval models based on path-constrained random walks. In KDD, pp. 881–888. Lappas, T., Liu, L., & Terzi, E. (2009). Finding a team of experts in social networks. In KDD, pp. 467–476. Le, W., Li, F., Kementsietsidis, A., & Duan, S. (2014). Scalable keyword search on large RDF data. TKDE, 26(11), 2774–2788. Leskovec, J., Rajaraman, A., & Ullman, J. (2014). Mining of massive datasets (2nd ed.). Cambridge: Cambridge University Press. Li, G., Ooi, B., Feng, J., Wang, J., & Zhou, L. (2008). EASE: Efficient and adaptive keyword search on unstructured, semi-structured and structured data. In SIGMOD, pp. 903–904. Li, R. H., Qin, L., Yu, J. X., & Mao, R. (2015). Influential community search in large networks. PVLDB, 8(5), 509–520. Majumder, A., Datta, S., & Naidu, K. (2012). Capacitated team formation problem on social networks. In KDD, pp. 1005–1013. Markowetz, A., Ying, Y., & Papadias, D. (2007). Keyword search on relational data streams. In SIGMOD, pp. 605–616. Pinero, J. (2017). DisGeNET: A comprehensive platform integrating information on human disease-associated genes and variants. Nucleic Acids Research, 45(D1), 833–839. Prud’hommeaux, E. (2008). SPARQL query language for RDF, W3C recommendation. http://www.w3.org/TR/rdf-sparql-query/. Qin, L., Yu, J., Chang, L., & Tao, Y. (2009). Querying communities in relational databases. In ICDE, pp. 724–735. Sozio, M., & Gionis, A. (2010). The community-search problem and how to plan a successful cocktail party. In KDD, pp. 939–948. Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., & Su, Z.. (2008). Arnetminer: extraction and mining of academic social networks. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 990–998. ACM. Termehchy, A., & Winslett, M. (2011). Using structural information in XML keyword search effectively. ACM Transactions on Database Systems (TODS), 36(2), 1–49. Wang, H., & Aggarwal, C. C. (2010). A survey of algorithms for keyword search on graph data, (pp. 249–273). Springer US, Boston, MA. Wang, W., He, Z., Shi, P., Wu, W., Jiang, Y., An, B., et al. (2019). Strategic social team crowdsourcing: Forming a team of truthful workers for crowdsourcing in social networks. IEEE Transactions on Mobile Computing, 18(6), 1419–1432. Yang, Y., Agrawal, D., Jagadish, H. V., Tung, A. K. H., & Wu, S. (2019). An efficient parallel keyword search engine on knowledge graphs. In 2019 IEEE 35th international conference on data engineering (ICDE), pp. 338–349. Yin, X., Qu, C., Wang, Q., Wu, F., Liu, B., Chen, F., et al. (2019). Social connection aware team formation for participatory tasks. IEEE Access, 6, 20309–20319. Zhang, J., Yu, P. S., & Lv, Y. (2017). Enterprise employee training via project team formation. In WSDM, pp. 3–12.