Social data provenance framework based on zero-information loss graph database

Social Network Analysis and Mining - Tập 12 - Trang 1-25 - 2022
Asma Rani1,2, Navneet Goyal2, Shashi K. Gadia3
1Department of CSE, DBRAIT, Port Blair, India
2CSIS Department, BITS Pilani, Pilani, Pilani, India
3Department of CS, Iowa State University, Ames, USA

Tóm tắt

Social media has become a common platform for global communication across the world due to its rapid dissemination of information among a large audience. Its popularity has raised a crucial challenge to capture the social data provenance of a piece of information published on social media. Social data provenance describes the source and deriving process of a digital content, and when it is updated since its existence? It aids in determining reliability, authenticity, and trustworthiness of a piece of information and explaining how, when, and by whom this information is published. In this paper, we propose a social data provenance (SDP) framework based on zero-information loss graph database (ZILGDB). The proposed framework supports historical data queries, and querying through time using updates management in ZILGDB. It has the capability to capture provenance for a query set including select, aggregate, and data update queries with insert, delete, and update operations. It also provides a detailed provenance analysis through visualization and with efficient multi-depth provenance querying support, to determine both direct and indirect sources of a digital content. We conduct a real-life use case study to evaluate the usefulness of proposed framework in terrorist attack investigation. We evaluate the performance of proposed framework in terms of average execution time for various provenance queries, and provenance capturing overhead for a query set.

Tài liệu tham khảo

Afra S, Alhajj R (2021) Integrated framework for criminal network extraction from Web. J Inf Sci 47(2):206–226 Akoush S, Sohan R, Hopper A (2013). Hadoopprov: towards provenance as a first class citizen in mapreduce. In: Proceedings of 5th USENIX workshop on the theory and practice of provenance (TaPP 13) Allen D, Hodler A, Hunger M, Knobloch M, Lyon W, Needham M, Voigt H (2019) Understanding trolls with efficient analytics of large graphs in neo4j. In: Proceedings of Datenbanksystem for business, technologies and web (BTW 2019) Angles R, Gutierrez C (2008) Survey of graph database models. J ACM Comput Surv (CSUR) 40(1):1–39 Angles R, Gutierrez C (2018) An introduction to graph data management. In: Graph data management. Springer, Cham, pp 1–32 Aryono T (2016) Modelling social media semi-structured data with graph database. In: Proceedings of international conference ICONIET, pp 1–7. https://www.academia.edu/27198471/Modelling_Social_Media_Semi_structured_Data_with_Graph_Database Baeth MJ, Aktas MS (2017) A large scale synthetic social provenance database. In: Proceedings of the 9th international conference DBKDA, pp 16–22 Bearman DA, Lytle RH (1985) The power of the principle of provenance. Archivaria 1(21). http://journals.sfu.ca/archivar/index.php/archivaria/article/viewArticle/11231 Bhargava G, Gadia SK (1993) Relational database systems with zero information loss. J IEEE Trans Knowl Data Eng 5(1):76–87 Boselli R, Cesarini M, Mercorio F, Mezzanzanica M, Vaccarino A (2017, July) A pipeline for multimedia Twitter analysis through graph databases: preliminary results. In: Proceedings of international conference DATA, pp 343–349 Buneman P, Davidson SB (2010, September) Data provenance—the foundation of data quality. In: Proceedings of workshop: issues and opportunities for improving the quality and use of data within the DoD, Arlington, USA, pp 26–28 Buneman P, Tan WC (2019) Data provenance: what next? ACM SIGMOD Rec 47(3):5–16 Buneman P, Khanna S, Tan WC (2000, December) Data provenance: some basic issues. In: Proceedings of international conference on foundations of software technology and theoretical computer science, pp 87–93 Cattuto C, Quaggiotto M, Panisson A, Averbuch A (2013, June) Time-varying social networks in a graph database: a Neo4j use case. In: Proceedings of first international workshop on graph data management experiences and systems, pp 1–6 Cheney J, Chong S, Foster N, Seltzer M, Vansummeren S (2009, October) Provenance: a future history. In: Proceedings of the 24th ACM SIGPLAN conference companion on object oriented programming systems languages and applications, pp 957–964 Cheng Y, Nguyen D, Bijon K, Krishnan R, Park J, Sandhu R (2012, September) Towards provenance and risk-awareness in social computing. In: Proceedings of the first international workshop on secure and resilient architectures and systems, pp 25–30 Corsar D, Markovic M, Edwards P (2016, June) Social media data in research: provenance challenges. In: Proceedings of international provenance and annotation workshop (IPAW), pp 195–198 De Nies T, Taxidou I, Dimou A, Verborgh R, Fischer PM, Mannens E, Van de Walle R (2015, October) Towards multi-level provenance reconstruction of information diffusion on social media. In: Proceedings of the 24th ACM international on conference on information and knowledge management, pp 1823–1826 DeBoer D, Zhou W, Singh L (2013, June) Using substructure mining to identify misbehavior in network provenance graphs. In: Proceedings of the first international workshop on graph data management experiences and systems, pp 1–6 Duong CT, Nguyen QVH, Wang S, Stantic B (2017, September) Provenance-based rumor detection. In: Proceedings of Australasian database conference, pp 125–137 Durand GC, Pinnecke M, Broneske D, Saake G (2017, March) Backlogs and interval timestamps: building blocks for supporting temporal queries in graph databases. In: Proceedings of EDBT/ICDT workshops Feng Z, Gundecha P, Liu H (2018) Social provenance. Springer, New York, pp 2768–2772 Fernandes D, Bernardino J (2018, July) Graph databases comparison: AllegroGraph, ArangoDB, InfiniteGraph, Neo4J, and OrientDB. In: Proceedings of international conference DATA, pp 373–380 Filgueira R, Krause A, Atkinson M, Klampanos I, Spinuso A, Sanchez-Exposito S (2015, August) dispel4py: an agile framework for data-intensive escience. In: Proceedings of IEEE 11th international conference on e-Science, pp 454–464 Glavic B, Miller RJ (2011) Reexamining some holy grails of data provenance. In: TaPP 11 Gundecha P, Feng Z, Liu H (2013a, October) Seeking provenance of information using social media. In: Proceedings of the 22nd ACM international conference on information and knowledge management, pp 1691–1696 Gundecha P, Ranganath S, Feng Z, Liu H (2013b, August) A tool for collecting provenance data in social media. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1462–1465 Herschel M, Diestelkämper R, Lahmar HB (2017) A survey on provenance: what for? What form? What from? VLDB J 26(6):881–906 Kaplan AM, Haenlein M (2010) Users of the world, unite! The challenges and opportunities of Social Media. J Bus Horiz 53(1):59–68 Kerchner D, Littman J, Peterson C, Smallen V, Trent R, Wrubel L (2016) The provenance of a tweet. https://scholarspace.library.gwu.edu/downloads/h128nd689 Markovic M, Edwards P, Corsar D (2013) A role for provenance in social computation. In: Proceedings of the first international workshop on crowdsourcing the semantic web—CrowdSem Namaki MH, Song Q, Wu Y, Yang S (2019) Answering Why-questions by exemplars in attributed graphs. In: Proceedings of the international conference on management of data (SIGMOD ’19) O’Reilly T, Milstein S (2011) The Twitter book. O’Reilly Media, Inc., Newton Papavasileiou V, Yocum K, Deutsch A (2019, June) Ariadne: online provenance for big graph analytics. In: Proceedings of the international conference on management of data, pp 521–536 Park H, Ikeda R, Widom J (2011) Ramp: a system for capturing and tracing provenance in mapreduce workflows. Proc VLDB Endow 4(12):1351–1354 Ramusat Y, Maniu S, Senellart P (2018) Semiring provenance over graph databases. In: Proceedings of 10th USENIX workshop on the theory and practice of provenance (TaPP 18) Ranganath S, Gundecha P, Liu H (2013, October) A tool for assisting provenance search in social media. In: Proceedings of the 22nd ACM international conference on information and knowledge management, pp 2517–2520 Rani A, Goyal N, Gadia SK (2015, October) Data provenance for historical queries in relational database. In: Proceedings of the 8th annual ACM India conference, pp 117–122 Rani A, Goyal N, Gadia SK (2016, October) Efficient multi-depth querying on provenance of relational queries using graph database. In: Proceedings of the 9th annual ACM India conference, pp 11–20 Rani A, Goyal N, Gadia KS (2021) Provenance framework for Twitter data using zero-information loss graph database. In: Proceedings of the 8th ACM IKDD CODS and 26th COMAD, pp 74–82 Riveni M, Baeth MJ, Aktas MS, Dustdar S (2017, August) Provenance in social computing: a case study. In: Proceedings of the 13th international conference on semantics, knowledge and grids (SKG), pp 77–84 Robinson I, Webber J, Eifrem E (2015) Graph databases: new opportunities for connected data. O’Reilly Media, Inc., Newton Sharma S (2015) An extended classification and comparison of nosql big data models. arXiv preprint arXiv:1509.08035 Silberschatz A, Korth HF, Sudarshan S (1996) Data models. J ACM Comput Surv (CSUR) 28(1):105–108 Simmhan YL, Plale B, Gannon D (2005) A survey of data provenance in e-science. Proc ACM Sigmod Rec 34(3):31–36 Soni D, Ghanem T, Gomaa B, Schommer J (2019, June) Leveraging Twitter and Neo4j to Study the Public Use of Opioids in the USA. In: Proceedings of the 2nd joint international workshop on graph data management experiences & systems (GRADES) and network data analytics (NDA), pp 1–5 Soto A, Ryan C, Peña Silva F, Das T, Wolkowicz J, Milios E, Brooks S (2018) Data quality challenges in Twitter content analysis for informing policy making in health care. In: Proceedings of Hawaii international conference on system sciences (HICSS) Tas Y, Baeth MJ, Aktas MS (2016, August) An approach to standalone provenance systems for big social provenance data. In: Proceedings of the 12th international conference on semantics, knowledge and grids (SKG), pp 9–16 Taxidou I, De Nies T, Verborgh R, Fischer PM, Mannens E, Van de Walle R (2015, May) Modeling information diffusion in social media as provenance with W3C PROV. In: Proceedings of the 24th international conference on world wide web, pp 819–824 Taxidou I, Lieber S, Fischer PM, De Nies T, Verborgh R (2018) Web-scale provenance reconstruction of implicit information diffusion on social media. J Distrib Parallel Databases 36(1):47–79 Twitter Data Set (2018) https://www.kaggle.com/umarhabib/pulwama-killing-twitter-data Wang J, Crawl D, Purawat S, Nguyen M, Altintas I (2015, October) Big data provenance: challenges, state of the art and opportunities. In: Proceedings of the IEEE international conference on big data (big data), pp 2509–2516 Yang J, Yu M, Qin H, Lu M, Yang C (2019) A Twitter data credibility framework—Hurricane Harvey as a use case. ISPRS Int J Geo-Inf 8(3):111 Yuan Z, Ton That DH, Kothari S, Fils G, Malik T (2018) Utilizing provenance in reusable research objects. J Inform 5(1):14 Zhang E, Fiaidhi J, Mohammed S, Rd O, Bay T, Pb ON (2017) Social recommendation using graph database Neo4j: mini blog, Twitter social network graph case study. Int J Future Gener Commun Netw 10(2):9–20 Zhao L, Hua T, Lu CT, Chen R (2016) A topic-focused trust model for Twitter. J Comput Commun 76:1–11