Computational approaches to detect experts in distributed online communities: a case study on Reddit

Springer Science and Business Media LLC - 2023

Sofia Strukova¹, José A. Ruipérez-Valiente¹, Félix Gómez Mármol¹

¹Department of Information and Communications Engineering, University of Murcia, Murcia, Spain

Tóm tắt

The irreplaceable key to the triumph of Question & Answer (Q & A) platforms is their users providing high-quality answers to the challenging questions posted across various topics of interest. From more than a decade, the expert finding problem attracted much attention in information retrieval research. Based on the encountered gaps in the expert identification across several Q & A portals, we inspect the feasibility of identifying data science experts in Reddit. Our method is based on the manual coding results where two data science experts labelled not only expert and non-expert comments, but also out-of-scope comments, which is a novel contribution to the literature, enabling the identification of more groups of comments across web portals. We present a semi-supervised approach which combines 1113 labelled comments with 100,226 unlabelled comments during training. We proved that it is possible to develop models that can identify expert, non-expert and out-of-scope comments peaking the AUC score at 0.93, accuracy at 0.83, MAE at 0.15 degrees and R2 score at 0.69. The proposed model uses the activity behaviour of every user, including Natural Language Processing (NLP), crowdsourced and user feature sets. We conclude that the NLP and user feature sets contribute the most to the better identification of these three classes. It means that this method can generalise well within the domain. Finally, we make a novel contribution by presenting different types of users in Reddit, which opens many future research directions.

Từ khóa

Tài liệu tham khảo

Razeeth, M., Kariapper, R., Pirapuraj, P., Nafrees, A., Rishan, U., Nusrath Ali, S.: E-learning at home vs traditional learning among higher education students: a survey based analysis (2019) Strukova, S., Ruipérez-Valiente, J.A., Mármol, F.G.: A survey on data-driven evaluation of competencies and capabilities across multimedia environments. Int. J. Interact. Multi. Artif. Intell. (2022). https://doi.org/10.9781/ijimai.2022.10.004 Aabdelaziz: Best & Most Popular Forums, Message Boards & Online Communities. https://it-maniacs.com/best-and-most-popular-forums-message-boards-and-online-communities-top-30/. Accessed 10 Feb 2022 (2021) Ansari, N., Sharma, R.: Identifying semantically duplicate questions using data science approach: A quora case study. arXiv preprint arXiv:2004.11694 (2020). https://doi.org/10.48550/arXiv.2004.11694 Rogers, A., Gardner, M., Augenstein, I.: Qa dataset explosion: a taxonomy of nlp resources for question answering and reading comprehension. ACM Comput. Surv. (2023). https://doi.org/10.1145/3560260 Graham, M., Dutton, W.H.: Society and the Internet: How Networks of Information and Communication Are Changing Our Lives. Oxford University Press, England (2014). https://doi.org/10.1093/acprof:oso/9780199661992.001.0001 Lim, W.H., Carman, M.J., Wong, S.-M.J.: Estimating relative user expertise for content quality prediction on reddit. In: Proceedings of the 28th ACM Conference on Hypertext and Social Media. HT ’17, (pp. 55–64). Association for Computing Machinery, New York (2017) Azhar, A., Rubab, S., Khan, M.M., Bangash, Y.A., Alshehri, M.D., Illahi, F., Bashir, A.K.: Detection and prediction of traffic accidents using deep learning techniques. Cluster Comput. 26(1), 477–493 (2022). https://doi.org/10.1007/s10586-021-03502-1 Zago, M., Nespoli, P., Papamartzivanos, D., Perez, M.G., Marmol, F.G., Kambourakis, G., Perez, G.M.: Screening out social bots interference: are there any silver bullets? IEEE Commun. Mag. 57(8), 98–104 (2019). https://doi.org/10.1109/MCOM.2019.1800520 Gyongyi, Z., Koutrika, G., Pedersen, J., Garcia-Molina, H.: Questioning yahoo! answers. Technical Report 2007-35, Stanford InfoLab (2007). http://ilpubs.stanford.edu:8090/819/ Diyanati, A., Sheykhahmadloo, B.S., Fakhrahmad, S.M., Sadredini, M.H., Diyanati, M.H.: A proposed approach to determining expertise level of stackoverflow programmers based on mining of user comments. J. Comput. Lang. 61, 101000 (2020). https://doi.org/10.1016/j.cola.2020.101000 Roy, P.K.: Multilayer convolutional neural network to filter low quality content from quora. Neural Process. Lett. 52(1), 805–821 (2020). https://doi.org/10.1007/s11063-020-10284-x Farrugia, L., Lauri, M.A., Borg, J., O’Neill, B.: Have you asked for it? an exploratory study about maltese adolescents’ use of ask.fm. J. Adolesc. Res. 34(6), 738–756 (2019) Zhang, J., Chen, Y., Zhao, Y., Wolfram, D., Ma, F.: Public health and social media: a study of zika virus-related posts on yahoo! answers. J. Assoc.Inf. Sci. Technol. 71(3), 282–299 (2020). https://doi.org/10.1002/asi.24245 Zhao, Y., Wu, L., Zhang, J., Le, T.: How question characteristics impact answer outcomes on social question-and-answer websites. J. Glob. Inf. Manag. 29(6), 1–21 (2021) Patil, S., Lee, K.: Detecting experts on quora: by their activity, quality of answers, linguistic characteristics and temporal behaviors. Soc. Netw. Anal. Min. 6(1), 5 (2016). https://doi.org/10.1007/s13278-015-0313-x Wang, G., Gill, K., Mohanlal, M., Zheng, H., Zhao, B.Y.: Wisdom in the social crowd: An analysis of quora. In: Proceedings of the 22nd International Conference on World Wide Web. WWW ’13, (pp. 1341–1352). Association for Computing Machinery, New York, (2013). https://doi.org/10.1145/2488388.2488506 Anderson, K.E.: Ask me anything: what is Reddit? Libr. Hi Tech News 32, 8–11 (2015) Adamic, L.A., Zhang, J., Bakshy, E., Ackerman, M.S.: Knowledge sharing and yahoo answers: Everyone knows something. In: Proceedings of the 17th International Conference on World Wide Web. WWW ’08, pp. 665–674. Association for Computing Machinery, New York, (2008). https://doi.org/10.1145/1367497.1367587 Qian, L., Wang, J., Lin, H., Xu, B., Yang, L.: Heterogeneous information network embedding based on multiperspective metapath for question routing. Knowl.-Based Syst. 240, 107842 (2022). https://doi.org/10.1016/j.knosys.2021.107842 Kassing, S., Oosterman, J., Bozzon, A., Houben, G.-J.: Locating domain-specific contents and experts on social bookmarking communities. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing. SAC ’15, (pp. 747–752). Association for Computing Machinery, New York, (2015). https://doi.org/10.1145/2695664.2695777 Choi, D., Han, J., Chung, T., Ahn, Y.-Y., Chun, B.-G., Kwon, T.T.: Characterizing conversation patterns in Reddit: From the perspectives of content properties and user participation behaviors. In: Proceedings of the 2015 ACM on Conference on Online Social Networks. COSN ’15, pp. 233–243. Association for Computing Machinery, New York, (2015). https://doi.org/10.1145/2817946.2817959 van Dijk, D., Tsagkias, M., de Rijke, M.: Early detection of topical expertise in community question answering. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR ’15, pp. 995–998. Association for Computing Machinery, New York, (2015). https://doi.org/10.1145/2766462.2767840 Egghe, L.: Theory and practise of the g-index. Scientometrics 69(1), 131–152 (2006) Faisal, M.S., Daud, A., Akram, A.U., Abbasi, R.A., Aljohani, N.R., Mehmood, I.: Expert ranking techniques for online rated forums. Comput. Human Behav. 100, 168–176 (2019). https://doi.org/10.1016/j.chb.2018.06.013 Riahi, F., Zolaktaf, Z., Shafiei, M., Milios, E.: Finding expert users in community question answering. In: Proceedings of the 21st International Conference on World Wide Web. WWW ’12 Companion, pp. 791–798. Association for Computing Machinery, New York, (2012). https://doi.org/10.1145/2187980.2188202 Bouguessa, M., Dumoulin, B., Wang, S.: Identifying authoritative actors in question-answering forums: The case of yahoo! answers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’08, pp. 866–874. Association for Computing Machinery, New York, (2008). https://doi.org/10.1145/1401890.1401994 Zhao, Z., Zhang, L., He, X., Ng, W.: Expert finding for question answering via graph regularized matrix completion. IEEE Trans. Knowl. Data Eng. 27(4), 993–1004 (2015). https://doi.org/10.1109/TKDE.2014.2356461 Sumanth, P., Rajeshwari, K.: Discovering top experts for trending domains on stack overflow. Procedia Comput. Sci. 143, 333–340 (2018) Jurczyk, P., Agichtein, E.: Discovering authorities in question answer communities by using link analysis. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management. CIKM ’07, pp. 919–922. Association for Computing Machinery, New York, (2007). https://doi.org/10.1145/1321440.1321575 Gobert, J.D., Pedro, M.S., Raziuddin, J., Baker, R.S.: From log files to assessment metrics: measuring students’ science inquiry skills using educational data mining. J. Learn. Sci. 22(4), 521–563 (2013). https://doi.org/10.1080/10508406.2013.837391 Strukova, S., Ruipérez-Valiente, J.A., Mármol, F.G.: Towards the identification of experts in informal learning portals at scale. In: Proceedings of the Tenth ACM Conference on Learning @ Scale. L@S (2023). https://doi.org/10.1145/3573051.3596179 Amaya, A., Bach, R., Keusch, F., Kreuter, F.: New data sources in social science research: things to know before working with reddit data. Soc. Sci. Comput. Rev. 39(5), 943–960 (2021). https://doi.org/10.1177/0894439319893305 Saltz, J.S.: The need for new processes, methodologies and tools to support big data teams and improve big data project effectiveness. In: 2015 IEEE International Conference on Big Data (Big Data), pp. 2066–2071 (2015). https://doi.org/10.1109/BigData.2015.7363988 Xin, D., Ma, L., Liu, J., Macke, S., Song, S., Parameswaran, A.: Accelerating human-in-the-loop machine learning: Challenges and opportunities. In: Proceedings of the Second Workshop on Data Management for End-To-End Machine Learning. DEEM’18. Association for Computing Machinery, New York, (2018). https://doi.org/10.1145/3209889.3209897 Monarch, R.M.: Human-in-the-Loop Machine Learning: Active Learning and Annotation for Human-centered AI. Simon and Schuster, New York (2021) Baker, R., de Carvalho, A.: Labeling student behavior faster and more precisely with text replays. In: Educational Data Mining (2008) Das, M., Cui, R., Campbell, D.R., Agrawal, G., Ramnath, R.: Towards methods for systematic research on big data. In: 2015 IEEE International Conference on Big Data (Big Data), pp. 2072–2081 (2015). https://doi.org/10.1109/BigData.2015.7363989 Kanan, T., Mughaid, A., Al-Shalabi, R., Al-Ayyoub, M., Elbes, M., Sadaqa, O.: Business intelligence using deep learning techniques for social media contents. Cluster Comput. (2022). https://doi.org/10.1007/s10586-022-03626-y Farzindar, A., Inkpen, D.: Natural language processing for social media. Synth. Lect. Hum. Lang. Technol. 8(2), 1–166 (2015) Ferrer, X., van Nuenen, T., Such, J.M., Criado, N.: Discovering and categorising language biases in Reddit. Proc. Int. AAAI Conf. Web Soc. Media 15(1), 140–151 (2021). https://doi.org/10.1609/icwsm.v15i1.18048 Nanomi Arachchige, I.A., Sandanapitchai, P., Weerasinghe, R.: Investigating machine learning & natural language processing techniques applied for predicting depression disorder from online support forums: a systematic literature review. Information (2021). https://doi.org/10.3390/info12110444 Yan, X., Yang, J., Obukhov, M., Zhu, L., Bai, J., Wu, S., He, Q.: Social skill validation at linkedin. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. KDD ’19, pp. 2943–2951. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3292500.3330752 Jindal, R., Singh, I.: Detecting malicious transactions in database using hybrid metaheuristic clustering and frequent sequential pattern mining. Cluster Comput. 25(6), 3937–3959 (2022). https://doi.org/10.1007/s10586-022-03622-2 Parra-Arnau, J., Mármol, F.G., Rebollo-Monedero, D., Forné, J.: Shall i post this now? optimized, delay-based privacy protection in social networks. Knowl. Inf. Syst. 52(1), 113–145 (2017). https://doi.org/10.1007/s10115-016-1010-4 Pastor-Galindo, J., Zago, M., Nespoli, P., Bernal, S.L., Celdrán, A.H., Pérez, M.G., Ruipérez-Valiente, J.A., Pérez, G.M., Mármol, F.G.: Spotting political social bots in twitter: a use case of the 2019 Spanish general election. IEEE Trans. Netw. Serv. Manag. 17(4), 2156–2170 (2020). https://doi.org/10.1109/TNSM.2020.3031573 Bevilacqua, M., Ciarapica, F.E.: Human factor risk management in the process industry: a case study. Reliab. Eng. Syst. Saf. 169, 149–159 (2018). https://doi.org/10.1016/j.ress.2017.08.013 Alyafeai, Z., AlShaibani, M.S., Ahmad, I.: A Survey on Transfer Learning in Natural Language Processing (2020) Provost, F., Fawcett, T.: Data Science for Business: What You Need to Know About Data Mining and Data-analytic Thinking. O’Reilly Media Inc, New York (2013) Dhar, V.: Data science and prediction. Commun. ACM 56(12), 64–73 (2013). https://doi.org/10.1145/2500499 Wing, J.M.: Computational thinking. Commun. ACM 49(3), 33–35 (2006). https://doi.org/10.1145/1118178.1118215 Plaza, P., Castro, M., Sáez-López, J.M., Sancristobal, E., Gil, R., Menacho, A., García-Loro, F., Quintana, B., Martin, S., Blázquez, M., et al.: Promoting computational thinking through visual block programming tools. In: 2021 IEEE Global Engineering Education Conference (EDUCON), pp. 1131–1136 (2021). https://doi.org/10.1109/EDUCON46332.2021.9453903 Loria, S.: textblob documentation. Release 0.15 2, 269 (2018) Fast, E., Chen, B., Bernstein, M.S.: Empath: Understanding topic signals in large-scale text. In: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. CHI ’16, pp. 4647–4657. Association for Computing Machinery, New York, (2016). https://doi.org/10.1145/2858036.2858535

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Công cụ kiểm tra chính tả và thể thức Viver

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA