Nội dung được dịch bởi AI, chỉ mang tính chất tham khảo
Phân tích hồ sơ người nổi tiếng thông qua phân tích ngôn ngữ trên các mạng xã hội kỹ thuật số
Tóm tắt
Các mạng xã hội kỹ thuật số đã trở thành nguồn thông tin thiết yếu vì các người nổi tiếng sử dụng chúng để chia sẻ ý kiến, ý tưởng, suy nghĩ và cảm xúc của họ. Điều này khiến các mạng xã hội kỹ thuật số trở thành một trong những phương tiện ưa thích cho người nổi tiếng để quảng bá bản thân và thu hút người theo dõi mới. Bài báo này đề xuất một mô hình lựa chọn đặc điểm cho việc phân loại hồ sơ của người nổi tiếng dựa trên việc sử dụng mạng xã hội kỹ thuật số Twitter. Mô hình bao gồm phân tích các đặc điểm thông tin thuộc tính từ vựng, cú pháp, biểu tượng, tham gia và bổ sung của các bài viết của người nổi tiếng để ước lượng, dựa trên những điều này, các đặc điểm nhân khẩu học và sức ảnh hưởng của họ. Phân loại với những đặc điểm mới này có điểm F1 là 0.65 trong danh tiếng, 0.88 trong giới tính, 0.37 trong năm sinh, và 0.57 trong nghề nghiệp. Với những đặc điểm mới này, độ chính xác trung bình đã cải thiện lên 0.14. Kết quả là, các đặc điểm được trích xuất từ các dấu hiệu ngôn ngữ đã cải thiện hiệu suất của các mô hình dự đoán Danh tiếng và Giới tính và tạo điều kiện cho việc giải thích các kết quả mô hình. Đặc biệt, việc sử dụng ngôi thứ ba số ít là rất dự đoán trong mô hình Danh tiếng.
Từ khóa
#người nổi tiếng #phân tích ngôn ngữ #mạng xã hội kỹ thuật số #phân loại hồ sơ #TwitterTài liệu tham khảo
Sherchan, W., Nepal, S., Paris, C.: A survey of trust in social networks. ACM Comput. Surv. 45(4), 47–14733 (2013). https://doi.org/10.1145/2501654.2501661
Cercel, D.-C., Trausan-Matu, S.: Opinion propagation in online social networks: a survey. ACM International Conference Proceeding Series (2014). https://doi.org/10.1145/2611040.2611088
Allor, M.: Relocating the site of the audience. Crit. Stud. Mass Commun. 5(3), 217–233 (1988). https://doi.org/10.1080/15295038809366704
Reynolds, W.N., Salter, W.J., Farber, R.M., Corley, C., Dowling, C.P., Beeman, W.O., Smith-Lovin, L., Choi, J.N.: Sociolect-based community detection. In: 2013 IEEE International Conference on Intelligence and Security Informatics, pp. 221-226 (2013). https://doi.org/10.1109/ISI.2013.6578823
Golbeck, J.: Trust and nuanced profile similarity in online social networks. ACM Trans. Web 3(4), 12–11233 (2009). https://doi.org/10.1145/1594173.1594174
Mansouri, F., Abdelalim, S., Ikram, E.A.: A modeling framework for the moroccan sociolect recognition used on the social media. In: Proceedings of the 2Nd International Conference on Big Data, Cloud and Applications. BDCA’17, pp. 34–1345. ACM, New York, NY, USA (2017). https://doi.org/10.1145/3090354.3090389
Zanzotto, F.M., Pennacchiotti, M., Tsioutsiouliklis, K.: Linguistic redundancy in twitter. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. EMNLP ’11, pp. 659–669. Association for Computational Linguistics, Stroudsburg, PA, USA (2011). http://dl.acm.org/citation.cfm?id=2145432.2145509
Schwartz, H.A., Eichstaedt, J.C., Kern, M.L., Dziurzynski, L., Ramones, S.M., Agrawal, M., Shah, A., Kosinski, M., Stillwell, D., Seligman, M.E.P., Ungar, L.H.: Personality, gender, and age in the language of social media: the open-vocabulary approach. PLoS ONE 8(9), 73791 (2013). https://doi.org/10.1371/journal.pone.0073791
Yang, Y., Eisenstein, J.: Putting things in context: community-specific embedding projections for sentiment analysis (2015)
Rampton, B., Tusting, K., Maybin, J., Barwell, R.D.: UK linguistic ethnography: a discussion paper coordinating committee UK linguistic ethnography forum 1, (2004)
Rangel, F.M., Rosso, P., Montes-yGómez, M., Potthast, M., Stein, B.: Overview of the 6th author profiling task at pan 2018: multimodal gender identification in twitter. In: Notes Papers of the CLEF (2018)
Moreno-Sandoval, L.G., Puertas, E.A., Plaza-del-Arco, F.M., Pomares-Quimbaya, A., Alvarado-Valencia, J.A., Alfonso, L., Ureña-López: Celebrity profiling on twitter using sociolinguistic features notebook for pan at clef 2019. (2019)
Phad, P.V., Chavan, M.K.: Detecting compromised high-profile accounts on social networks. In: 2018 9th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pp. 1–4 (2018). https://doi.org/10.1109/ICCCNT.2018.8493851
Singh, M., Bansal, D., Sofat, S.: Who is who on twitter–spammer, fake or compromised account? A tool to reveal true identity in real-time. Cybern. Syst. 49(1), 1–25 (2018). https://doi.org/10.1080/01969722.2017.1412866
Aggarwal, C.C.. In: Aggarwal, C.C. (ed.): An Introduction to Social Network Data Analytics, pp. 1–15. Springer, Boston, MA (2011). https://doi.org/10.1007/978-1-4419-8462-3_1
Scott, J.: Social network analysis: developments, advances, and prospects. Soc. Netw. Anal. Min. 1(1), 21–26 (2011). https://doi.org/10.1007/s13278-010-0012-6
Vatrapu, R., Mukkamala, R.R., Hussain, A., Flesch, B.: Social set analysis: a set theoretical approach to big data analytics. IEEE Access 4, 1–1 (2016). https://doi.org/10.1109/ACCESS.2016.2559584
Li, C., Bai, J., Zhang, L., Tang, H., Luo, Y.: Opinion community detection and opinion leader detection based on text information and network topology in cloud environment. Inf. Sci. 504, 61–83 (2019). https://doi.org/10.1016/j.ins.2019.06.060
Zhang, H., Nguyen, D., Zhang, H., Thai, M.: Least cost influence maximization across multiple social networks. IEEE/ACM Trans. Netw. 24, 1–11 (2015). https://doi.org/10.1109/TNET.2015.2394793
Jadhav, K.U., Mhetre, N.A.: Mass users behaviour prediction in social media: a survey. Int. J. Comput. Sci. Inf. Technol. (IJCSIT) 5, 3286–3288 (2014)
Fan, L., Wu, W., Zhai, X., Xing, K., Lee, W., Du, D.-Z.: Maximizing rumor containment in social networks with constrained time. Soc. Netw. Anal. Min. (2014). https://doi.org/10.1007/s13278-014-0214-4
Nguyen, D., Doğruöz, A.S., Rosé, C.P., de Jong, F.: Computational sociolinguistics: a survey. Comput. Linguist. 42(3), 537–593 (2016). https://doi.org/10.1162/COLI_a_00258
Tsytsarau, M., Palpanas, T.: Survey on mining subjective data on the web. Data Min. Knowl. Discov. 24(3), 478–514 (2012). https://doi.org/10.1007/s10618-011-0238-6
Radivchev, V., Nikolov, A., Lambova, A.: Celebrity profiling using tf-idf, logistic regression, and svm—notebook for pan at clef 2019. In: Cappellato, L., Ferro, N., Losada, D.E., Müller, H. (eds.) CLEF 2019 Labs and Workshops, Notebook Papers, vol. 2380. CEUR-WS.org, Switzerland (2019). http://ceur-ws.org/Vol-2380/
Martinc, M., Škrlj, B., Pollak, S.: Who is hot and who is not? Profiling celebs on Twitter—notebook for PAN at CLEF 2019. In: Cappellato, L., Ferro, N., Losada, D., Müller, H. (eds.) CLEF 2019 Labs and Workshops, Notebook Papers, vol. 2380. CEUR-WS.org, Switzerland (2019). http://ceur-ws.org/Vol-2380/
Petrik, J., Chuda, D.: Twitter feeds profiling with TF-IDF—notebook for PAN at CLEF 2019. In: Cappellato, L., Ferro, N., Losada, D., Müller, H. (eds.) CLEF 2019 Labs and Workshops, Notebook Papers, vol. 2380. CEUR-WS.org, Switzerland (2019). http://ceur-ws.org/Vol-2380/
Simaki, V., Aravantinou, C., Mporas, I., Kondyli, M., Megalooikonomou, V.: Sociolinguistic features for author gender identification: from qualitative evidence to quantitative analysis. J. Quant. Linguist. 24(1), 65–84 (2017). https://doi.org/10.1080/09296174.2016.1226430
Peersman, C., Daelemans, W., Van Vaerenbergh, L.: Predicting age and gender in online social networks. In: Proceedings of the 3rd International Workshop on Search and Mining User-generated Contents. SMUC ’11, pp. 37–44. , ACM, New York, NY, USA (2011). https://doi.org/10.1145/2065023.2065035
Huang, Y., Yu, L., Wang, X., Cui, B.: A multi-source integration framework for user occupation inference in social media systems. World Wide Web 18(5), 1247–1267 (2015). https://doi.org/10.1007/s11280-014-0300-6
Sánchez-Rebollo, C., Puente, C., Palacios, R., Piriz, C., Fuentes, J.P., Jarauta, J.: Detection of jihadism in social networks using big data techniques supported by graphs and fuzzy clustering. Complexity 2019, 1–13 (2019). https://doi.org/10.1155/2019/1238780
Milroy, J., Milroy, L.: Mechanisms of change in urban dialects: the role of class, social network and gender. Int. J. Appl. Linguist. 3(1), 57–77 (1993). https://doi.org/10.1111/j.1473-4192.1993.tb00043.x
Przybyła, P., Teisseyre, P.: Analysing utterances in polish parliament to predict speaker’s background. J. Quant. Linguist. 21(4), 350–376 (2014)
Argamon, S., Fine, J., Rachel Shimoni, A.: Gender, genre, and writing style in formal written texts. Text (2003). https://doi.org/10.1515/text.2003.014
Romaine, S.: Language and Social Class, pp. 281–287. (2015). https://doi.org/10.1016/B978-0-08-097086-8.53015-3
Sloan, L., Morgan, J., Burnap, P., Williams, M.: Who tweets? deriving the demographic characteristics of age, occupation and social class from twitter user meta-data. PLOS ONE 10(3), 1–20 (2015). https://doi.org/10.1371/journal.pone.0115545
Wiegmann, M., Stein, B., Potthast, M.: Celebrity profiling. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2611–2618. Association for Computational Linguistics, Florence, Italy (2019). https://www.aclweb.org/anthology/P19-1249
Watts, D., Dodds, P.: Influentials, networks, and public opinion formation. J. Consum. Res. 34, 441–458 (2007). https://doi.org/10.1086/518527
Leskovec, J., Adamic, L.A., Huberman, B.A.: The dynamics of viral marketing. ACM Trans. Web (2007). https://doi.org/10.1145/1232722.1232727
Djafarova, E., Trofimenko, O.: ‘instafamous’—credibility and self-presentation of micro-celebrities on social media. Inf. Commun. Soc. 22(10), 1432–1446 (2019)
Wang, Y.-C., Kraut, R.E.: Twitter and the development of an audience: those who stay on topic thrive! In: CHI (2012)
Hutto, C.J., Yardi, S., Gilbert, E.: In: A longitudinal study of follow predictors on twitter, In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. CHI ’13, pp. 821–830. , ACM, New York, NY, USA (2013). https://doi.org/10.1145/2470654.2470771
Chang, S., Kumar, V., Gilbert, E., Terveen, L.: Specialization, homophily, and gender in a social curation site: Findings From Pinterest, pp. 674–686 (2014). https://doi.org/10.1145/2531602.2531660
Wang, Chun: Ya Jun Du, Ming Wei Tang: Opinion leader mining algorithm in microblog platform based on topic similarity. In: 2016 2nd IEEE International Conference on Computer and Communications (ICCC), pp. 160-165 (2016). https://doi.org/10.1109/CompComm.2016.7924685
Kiang, M.Y.: Neural networks. In: Bidgoli, H. (ed.) Encyclopedia of Information Systems, pp. 303–315. Elsevier, New York (2003). https://doi.org/10.1016/B0-12-227240-4/00121-0 . https://www.sciencedirect.com/science/article/pii/B978008044910400482X
Casas, I.: Neural networks. In: Kitchin, R., Thrift, N. (eds.) International Encyclopedia of Human Geography, pp. 419–422. Elsevier, Oxford (2009). https://doi.org/10.1016/B978-008044910-4.00482-X . www.sciencedirect.com/science/article/pii/B978008044910400482X
Hsu, C.-C., Lee, Y.-C., Lu, P.-E., Lu, S.-S., Lai, H.-T., Huang, C.-C., Wang, C., Lin, Y.-J., Su, W.-T.: Social media prediction based on residual learning and random forest, In: Proceedings of the 25th ACM International Conference on Multimedia. MM ’17, pp. 1865-1870. Association for Computing Machinery, New York, NY, USA (2017). https://doi.org/10.1145/3123266.3127894
Huang, J., Tang, Y., Hu, Y., Li, J., Hu, C.: Predicting the active period of popularity evolution: a case study on twitter hashtags. Inf. Sci. 512, 315–326 (2020). https://doi.org/10.1016/j.ins.2019.04.028
Zhang, Q., Gong, Y., Wu, J., Huang, H., Huang, X.: In: Retweet prediction with attention-based deep neural network. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. CIKM ’16, pp. 75-84. Association for Computing Machinery, New York, NY, USA (2016). https://doi.org/10.1145/2983323.2983809
Li, J., Xu, H., He, X., Deng, J., Sun, X.: Tweet modeling with lstm recurrent neural networks for hashtag recommendation, pp. 1570–1577 (2016). https://doi.org/10.1109/IJCNN.2016.7727385
Simaki, V., Mporas, I., Megalooikonomou, V.: Evaluation and sociolinguistic analysis of text features for gender and age identification. Am. J. Eng. Appl. Sci. 9, 868–876 (2016). https://doi.org/10.3844/ajeassp.2016.868.876
Johannsen, A., Hovy, D., Søgaard, A.: Cross-lingual syntactic variation over age and gender. (2015). https://doi.org/10.18653/v1/K15-1011
Namugera, F., Wesonga, R., Jehopio, P.: Text mining and determinants of sentiments: Twitter social media usage by traditional media houses in Uganda. Comput. Soc. Netw. (2019). https://doi.org/10.1186/s40649-019-0063-4
Zhong, G., Wang, L.-N., Dong, J.: An overview on data representation learning: from traditional feature learning to recent deep learning. J. Financ. Data Sci. (2016). https://doi.org/10.1016/j.jfds.2017.05.001
Wan, Y., Chen, X., Zhang, J.: Global and intrinsic geometric structure embedding for unsupervised feature selection. Expert Syst. Appl. (2017). https://doi.org/10.1016/j.eswa.2017.10.008
Sirovich, L., Kirby, M.: Low-dimensional procedure for the characterization of human faces. J. Opt. Soc. Am. A Opt Image Sci. 4, 519–24 (1987). https://doi.org/10.1364/JOSAA.4.000519
Jolliffe, I.. In: Lovric, M. (ed.) Principal Component Analysis, pp. 1094–1096. Springer, Berlin, Heidelberg (2011). https://doi.org/10.1007/978-3-642-04898-2_455
Peng, H., Bao, M., Li, J., Bhuiyan, M., Liu, Y., He, Y., Yang, E.: Incremental term representation learning for social network analysis. Future Gener. Comput. Syst. 86, 1503–1512 (2018). https://doi.org/10.1016/j.future.2017.05.020
Wang, S., Tang, J., Liu, H.: Embedded unsupervised feature selection. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. AAAI’15, pp. 470-476. AAAI Press. (2015)
Zhang, B., Xiang, J., Wang, X.: Network representation learning with ensemble methods. Neurocomputing 380, 141–149 (2020). https://doi.org/10.1016/j.neucom.2019.10.098
Peña, D.: Análisis de Datos Multivariantes. S.A. MCGRAW-HILL / INTERAMERICANA DE ESPAÑA, España (2002)
Sluban, B., Smailović, J., Battiston, S., Mozetič, I.: Sentiment leaning of influential communities in social networks. Comput. Soc. Netw. (2015). https://doi.org/10.1186/s40649-015-0016-5
Avnit, A.: The million followers fallacy. Pravda Media Group (2009)
Suh, B., Hong, L., Pirolli, P., Chi, E.H.: Want to be retweeted? Large scale analytics on factors impacting retweet in twitter network. In: 2010 IEEE Second International Conference on Social Computing, pp. 177-184 (2010)
Potthast, M., Gollub, T., Wiegmann, M., Stein, B.: TIRA Integrated Research Architecture, pp. 123–160 (2019). https://doi.org/10.1007/978-3-030-22948-1_5
Yazdanfar, N., Thomo, A.: Link recommender: Collaborative-filtering for recommending urls to twitter users. Procedia Computer Science 19, 412–419 (2013). https://doi.org/10.1016/j.procs.2013.06.056. The 4th International Conference on Ambient Systems, Networks and Technologies (ANT 2013), the 3rd International Conference on Sustainable Energy Information Technology (SEIT-2013)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Lemaître, G., Nogueira, F., Aridas, C.K.: Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 18(17), 1–5 (2017)
Wiegmann, M., Stein, B., Potthast, M.: Overview of the Celebrity Profiling Task at PAN 2019. In: Cappellato, L., Ferro, N., Losada, D., Müller, H. (eds.) CLEF 2019 Labs and Workshops, Notebook Papers, vol. 2380. CEUR-WS.org, Switzerland (2019). http://ceur-ws.org/Vol-2380/
Lim, K.H., Datta, A.: Finding twitter communities with common interests using following links of celebrities. (2012). https://doi.org/10.1145/2310057.2310064
Stoop, W., Van den Bosch, A.: Using idiolects and sociolects to improve word prediction, pp. 318–327 (2014). https://doi.org/10.3115/v1/E14-1034
Copland, F., Shaw, S., Snell, J.: Linguistic Ethnography: Interdisciplinary Explorations. Springer, London (2016)
Choi, C.J., Berger, R.: Ethics of celebrities and their increasing influence in 21st century society. J. Bus. Ethics 91(3), 313–318 (2010). https://doi.org/10.1007/s10551-009-0090-4
Friendly, M.: Corrgrams: exploratory displays for correlation matrices. Am. Stat. 56, 316–324 (2002)
Chessel, D., Dufour, A.-B., Thioulouse, J.: The ade4 package - I: one-table methods. R News 4(1), 5–10 (2004)
Lê, S., Josse, J., Husson, F.: FactoMineR: an R package for multivariate analysis. J. Stat. Softw. Artic. 25(1), 1–18 (2008). https://doi.org/10.18637/jss.v025.i01
Cappellato, L., Ferro, N., Losada, D.E., Müller, H. (eds.): CLEF 2019 Labs and Workshops, Notebook Papers, vol. 2380. CEUR-WS.org, Switzerland (2019)
Moreno-Sandoval, L.G., Mendoza-Molina, J.F., Puertas-Del Castillo, E.A., Duque-Marín, A., Pomares-Quimbaya, A., Alvarado-Valencia, J.A.: Age classification from Spanish tweets - the variable age analyzed by using linear classifiers. In: Hammoudi, S., Smialek, M., Camp, O., Filipe, J. (eds.) Proceedings of the 20th International Conference on Enterprise Information Systems (ICEIS 2018), pp. 275–281 (2018). https://doi.org/10.5220/0006811102750281
Moreno-Sandoval, L.G., Sanchéz-Barriga, C., Espíndola-Buitrago, K., Pomares-Quimbaya, A., Garcia, G.C.: Spanish Twitter data used as a source of information about consumer food choice. In: Holzinger, A., Kieseberg, P., Tjoa, A., Weippl, E. (eds.) Machine Learning and Knowledge Extraction. International Cross-Domain Conference for Machine Learning and Knowledge Extraction. CD-MAKE 2018. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99740-7_9
