A Data Quality Multidimensional Model for Social Media Analysis
Business & Information Systems Engineering - Trang 1-23 - 2023
Tóm tắt
Social media platforms have become a new source of useful information for companies. Ensuring the business value of social media first requires an analysis of the quality of the relevant data and then the development of practical business intelligence solutions. This paper aims at building high-quality datasets for social business intelligence (SoBI). The proposed method offers an integrated and dynamic approach to identify the relevant quality metrics for each analysis domain. This method employs a novel multidimensional data model for the construction of cubes with impact measures for various quality metrics. In this model, quality metrics and indicators are organized in two main axes. The first one concerns the kind of facts to be extracted, namely: posts, users, and topics. The second axis refers to the quality perspectives to be assessed, namely: credibility, reputation, usefulness, and completeness. Additionally, quality cubes include a user-role dimension so that quality metrics can be evaluated in terms of the user business roles. To demonstrate the usefulness of this approach, the authors have applied their method to two separate domains: automotive business and natural disasters management. Results show that the trade-off between quantity and quality for social media data is focused on a small percentage of relevant users. Thus, data filtering can be easily performed by simply ranking the posts according to the quality metrics identified with the proposed method. As far as the authors know, this is the first approach that integrates both the extraction of analytical facts and the assessment of social media data quality in the same framework.
Tài liệu tham khảo
Abu-Salih B, Wongthongtham P, Beheshti S, Beheshti B (2015) Towards a methodology for social business intelligence in the era of big social data incorporating trust and semantic analysis. In: 2nd International conference on advanced data and information engineering. Springer, Heidelberg
Abu-Salih B, Bremie B, Wongthongtham P, Duan K, Issa T, Chan KY, Alhabashneh M, Albtoush T, Alqahtani S, Alqahtani A, Alahmari M, Alshareef N, Albahlal A (2019) Social credibility incorporating semantic analysis and machine learning: a survey of the state-of-the-art and future research directions. In: Barolli L et al (eds) Web, artificial intelligence and network applications. Springer, Cham, pp. 87–100. https://doi.org/10.1007/978-3-030-15035-8_87
Abu-Salih B, Chan K. Y, Al-Kadi O, Al-Tawil M, Wongthongtham P, Issa T, Saadeh H, Al-Hassan M, Bremie B, Albahlal A (2020) Time-aware domain-based social influence prediction. Int J Big Data 7, Article 10. https://doi.org/10.1186/s40537-020-0283-3
Alrubaian M, Al-Qurishi M, Alamri A, Al-Rakhami M, Hassan M, Fortino G (2019) Credibility in online social networks: a survey. IEEE Access 7:2828–2855
Amigó E, Carrillo-de-Albornoz J, Chugur I, Corujo A, Gonzalo J, Meij E, de Rijke M, Spina D (2014) Overview of RepLab: author profiling and reputation dimensions for online reputation management. In: Kanoulas E et al (eds) Information access evaluation. Multilinguality, multimodality, and interaction. https://doi.org/10.1007/978-3-319-11382-1_24
Aramburu MJ, Berlanga R, Lanza I (2021) Quality management in social business intelligence projects. In: Proceedings of the 23rd International Conference on Enterprise Information Systems, pp 320–327. https://doi.org/10.5220/0010495703200327. https://www.scitepress.org/Papers/2021/104957/104957.pdf
Arenas-Márquez F, Martinez-Torres R, Toral S (2021) Convolutional neural encoding of online reviews for the identification of travel group type topics on TripAdvisor. Inf Proc Manag 58(5). https://doi.org/10.1016/j.ipm.2021.102645
Arolfo F, Cortés-Rodriguez K, Vaisman A (2022) Analyzing the quality of Twitter data streams. Inf Syst Front 24(1):349–369
Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley
Bansal P, Bansal R, Varma V (2015) Towards deep semantic analysis of hashtags. ECIR. https://doi.org/10.1007/978-3-319-16354-3_50
Berardi G, Esuli A, Marcheggiani D, Sebastiani F (2011) ISTI@TREC Microblog Track: Exploring the use of hashtag segmentation and text quality ranking. https://trec.nist.gov/pubs/trec21/papers/NEMIS_ISTI_CNR.microblog.final.pdf. Accessed 15 Jul 2022
Berkani N, Bellatreche L, Khouri S, Ordonez C (2019) Value-driven approach for designing extended data warehouses. DOLAP. http://ceur-ws.org/Vol-2324/Paper25-NBerkani.pdf. Accessed 15 Jul 2022
Berlanga R, García-Moya L, Nebot V, Aramburu MJ, Sanz I, Llidó DM (2015) SLOD-BI: An open data infrastructure for enabling social business intelligence. Int J Data Wareh Min 11(4):1–28. https://doi.org/10.4018/ijdwm.2015100101
Berlanga R, Lanza-Cruz I, Aramburu MJ (2019) Quality indicators for social business intelligence. In: 6th International Conference on Social Networks Analysis, Management and Security, Granada, pp 229–236. https://doi.org/10.1109/SNAMS.2019.8931862
Birjali M, Kasri M, Beni-Hssane B (2021) A comprehensive survey on sentiment analysis: approaches, challenges and trends. Knowl-based Syst 226
Cai L, Zhu Y (2015) The challenges of data quality and data quality assessment in the big data era. Data Sci J 14, Article 2
Cakir F, He K, Xia X, Kulis B, Sclaroff S (2019) Deep metric learning to rank In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1861–1870, https://doi.org/10.1109/CVPR.2019.00196
Chauhan U, Shah A (2021) Topic modeling using latent dirichlet allocation: a survey. ACM Comput Surv 54(7)
Choi J, Yoon J, Chung J, Coh B-Y, Lee J-M (2020) Social media analytics and business intelligence research: A systematic review. Inf Proc Manag 57(6). https://doi.org/10.1016/j.ipm.2020.102279
Crawford M, Khoshgoftaar TM, Prusa JD, Richter AN, Al Najada H (2015) Survey of review spam detection using machine learning techniques. J Big Data 2(23). https://doi.org/10.1186/s40537-015-0029-9
Cresci S, Di Pietro R, Petrocchi M, Spognardi A, Tesconi M (2015) Fame for sale: Efficient detection of fake Twitter followers. Decis Support Syst 80:56–71
Czernek A (2018) Social measurement depends on data quantity and quality. Millward Brown Dynamic Logic. https://cupdf.com/document/social-measurement-depends-on-data-quantity-and-2014-07-17-social-measurement.html. Accessed 15 Nov 2022
Duan Y, Zhimin C, Furu W, Ming Z, Shum H (2012) Twitter topic summarization by ranking tweets using social influence and content quality. In: Proceedings of the 24th International Conference on Computational Linguistics, pp 763–780. https://www.aclweb.org/anthology/C12-1047
Francia M, Gallinucci E, Golfarelli M, Rizzi S (2016) Social business intelligence in action. In: Nurcan S et al (eds) Advanced information systems engineering. Lecture Notes in Computer Science, vol 9694. Springer, Cham
Gallinucci E, Golfarelli M, Rizzi S (2015) Advanced topic modeling for social business intelligence. Inf Syst 53:87–106
García-Moya L, Kudama S, Aramburu MJ, Berlanga R (2013) Storing and analysing voice of the market data in the corporate data warehouse. Inf Syst Front 15:331–349. https://doi.org/10.1007/s10796-012-9400-y
Gioti H, Ponis S, Panayiotou N (2018) Social business intelligence: review and research directions. J Intell Stud Bus 8:23–42. https://doi.org/10.37380/jisib.v8i2.320
Goonetilleke O, Sellis T, Zhang X, Sathe S (2014) Twitter analytics: a big data management perspective. ACM SIGKDD Explor Newsl 16(1):11–20
Gröger C (2021) There is no AI without data. Commun ACM 64(11):98–108. https://doi.org/10.1145/3448247
Gupta A, Kumaraguru P, Castillo C, Meier P (2014) TweetCred: real-time credibility assessment of content on Twitter. In: Proceedings of the 6th International Conference on Social Informatics, pp 228–243. https://doi.org/10.1007/978-3-319-13734-6_16
Hammou B, Lahcen A, Mouline S (2020) Towards a real-time processing framework based on improved distributed recurrent neural network variants with fastText for social big data analytics. Inf Proc Manag 57(1). https://doi.org/10.1016/j.ipm.2019.102122
Han X, Wang L, Liu G, Zhao D, Xu S (2017) Occupation profiling with user-generated geolocation data. In: 2nd International Conference on Knowledge Engineering and Applications, pp 93–97. https://doi.org/10.1109/ICKEA.2017.8169908
Hernandez M, Hildrum K, Jain P, Wagle R, Alexe B, Krishnamurthy R, Stanoi IR, Venkatramani C (2013) Constructing consumer profiles from social media data. In: IEEE International Conference on Big Data, pp 710–716. https://doi.org/10.1109/BigData.2013.6691641
Holsapple C, Hsiao S, Pakath R (2018) Business social media analytics: characterization and conceptual framework. Decis Support Syst 110:32–45. https://doi.org/10.1016/j.dss.2018.03.004
Hu S, Kumar A, Al-Turjman F, Gupta S, Seth S, Shubham, (2020) Reviewer credibility and sentiment analysis based user profile modelling for online product recommendation. IEEE Access 8:26172–26189. https://doi.org/10.1109/ACCESS.2020.2971087
Immonen A, Pääkkönen P, Ovaska E (2015) Evaluating the Quality of Social Media Data in Big Data Architecture. IEEE Access 3:1–1. https://doi.org/10.1109/ACCESS.2015.2490723
Johannesson P, Perjons E (2014) An introduction to design science. Springer, ISBN: 978–3–319–10632–8
Jöhnk J, Weißert M, Wyrtki K (2021) Ready or not, AI comes – an interview study of organizational AI readiness factors. Bus Inf Syst Eng 63:5–20. https://doi.org/10.1007/s12599-020-00676-7
Kaufhold M-A, Christian M (2020) Rapid relevance classification of social media posts in disasters and emergencies: a system and evaluation featuring active, incremental and online learning. Inf Proc Manag 57(1). https://doi.org/10.1016/j.ipm.2019.102132
Keegan B, Rowley J (2017) Evaluation and decision-making in social media marketing. Manag Decis 55:15–31. https://doi.org/10.1108/MD-10-2015-0450
Kimball R, Ross M (2013) The data warehouse toolkit, 3rd edn. Wiley, p 48. ISBN 978–1–118–53080–1
Kolajo T, Daramola O, Adebiyi A, Seth A (2020) A framework for pre-processing of social media feeds based on integrated local knowledge base. Inf Proc Manag 57(6). https://doi.org/10.1016/j.ipm.2020.102348
Lanza-Cruz I, Berlanga R, Aramburu MJ (2023) Multidimensional author profiling for social business intelligence. Inf Syst Front. https://doi.org/10.1007/s10796-023-10370-0
Lanza-Cruz I, Berlanga R, Aramburu MJ (2018) Modeling analytical streams for social business intelligence. Inform 5:33. https://doi.org/10.3390/informatics5030033
Lauriola I, Lavelli A, Aiolli F (2022) An introduction to deep learning in natural language processing: models, techniques, and tools. Neurocomput 470:443–456
Lee I (2018) Social media analytics for enterprises: Typology, methods, and processes. Bus Horiz 61(2):199–210. https://doi.org/10.1016/j.bushor.2017.11.002
Lin J, Snow R, Morgan W (2011) Smoothing techniques for adaptive online language models: topic tracking in tweet streams. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 422–429. https://doi.org/10.1145/2020408.2020476
Nebot V, Rangel F, Berlanga R, Rosso P (2018) Identifying and classifying influencers in Twitter only with textual information. In: Nat Lang Proc Inf Syst 28–39. https://doi.org/10.1007/978-3-319-91947-8_3
Pääkkönen P, Jokitulppo J (2017) Quality management architecture for social media data. J Big Data 4(6). https://doi.org/10.1186/s40537-017-0066-7
Pasi G, Viviani M, Carton A (2019) A multi-criteria decision making approach based on the Choquet integral for assessing the credibility of user-generated content. Inf Sci 503:574–588. https://doi.org/10.1016/j.ins.2019.07.037
Păvăloaia V, Anastasiei I, Fotache D (2020) Social media and e-mail marketing campaigns: symmetry versus convergence. Symmetry 12(12):1940. https://doi.org/10.3390/sym12121940
Plachouras V, Stavrakas Y, Andreou A (2013) Assessing the coverage of data collection campaigns on Twitter: a case study. In: Demey Y, Panetto H (eds) On the move to meaningful internet systems. OTM 2013 Workshops. Lecture Notes in Computer Science vol 8186. https://doi.org/10.1007/978-3-642-41033-8_76
Podhoranyi M (2021) A comprehensive social media data processing and analytics architecture by using big data platforms: a case study of Twitter flood-risk messages. Earth Sci Inform 14. https://doi.org/10.1007/s12145-021-00601-w
Reimers N, Gurevych I (2019) Sentence-BERT: Sentence embeddings using Siamese BERT-networks, In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing
Rodríguez-Vidal J, Gonzalo J, Plaza L, Anaya-Sánchez H (2019) Automatic detection of influencers in social networks: authority versus domain signals. J Assoc Inf Sci Technol 70:675–684. https://doi.org/10.1002/asi.24156
Ruhi U (2014) Social media analytics as a BI practice: current landscape & future prospects. J Internet Soc Netw Virtual Commun. https://doi.org/10.5171/2014.920553
Sabherwal R, Becerra-Fernandez I (2013) Business intelligence: Practices, technologies, and management. Wiley
Sadiq S, Indulska M (2017) Open data: Quality over quantity. Int J Inf Manag 37:150–154. https://doi.org/10.1016/j.ijinfomgt.2017.01.003
Salvatore C, Biffignandi S, Bianchi A (2021) Social media and Twitter data quality for new social indicators. Soc Indic Res. https://doi.org/10.1007/s11205-020-02296-w
Saroj A, Pal S (2022) Use of social media in crisis management: a survey. Int J Disaster Reduct 48. https://doi.org/10.1016/j.ijdrr.2020.101584
Shankaranarayanan G, Blake R (2017) From content to context: the evolution and growth of data quality research. J Data Inf Qual 8:1–28. https://doi.org/10.1145/2996198
Sikdar S, Kang B, ODonovan J, Höllerer T, Adah S (2013) Understanding information credibility on Twitter. In: International Conference on Social Computing, Alexandria, pp 19–24. https://doi.org/10.1109/SocialCom.2013.9
Stieglitz S, Dang-Xuan L, Bruns A, Neuberger C (2014) Social media analytics. Bus Inf Syst Eng 6:89–96. https://doi.org/10.1007/s12599-014-0315-7
Stieglitz S, Mirbabaie M, Ross B, Neuberger C (2018) Social media analytics – Challenges in topic discovery, data collection, and data preparation. Int J Inf Manag 39:156–168
Tilly R, Posegga O, Fischbach K, Schoder D (2017) Towards a conceptualization of data and information quality in social information systems. Bus Inf Syst Eng 59:3–21. https://doi.org/10.1007/s12599-016-0459-8
Viviani M, Pasi G (2017) Credibility in social media: opinions, news, and health information – A survey. WIREs Data Mining Knowl Discov 7(5). https://doi.org/10.1002/widm.1209
Zachlod C, Samuel O, Ochsner A, Werthmüller S (2022) Analytics of social media data – State of characteristics and application. J Bus Res 144:1064–1076. https://doi.org/10.1016/j.jbusres.2022.02.016
Zhang R, Indulska M, Sadiq S (2019) Discovering data quality problems. Bus Inf Syst Eng 61:575–593. https://doi.org/10.1007/s12599-019-00608-0
Zheng L (2021) The classification of online consumer reviews: a systematic literature review and integrative framework. J Bus Res 135. https://doi.org/10.1016/j.jbusres.2021.06.038