StanceVis Prime: visual analysis of sentiment and stance in social media texts
Tóm tắt
Text visualization and visual text analytics methods have been successfully applied for various tasks related to the analysis of individual text documents and large document collections such as summarization of main topics or identification of events in discourse. Visualization of sentiments and emotions detected in textual data has also become an important topic of interest, especially with regard to the data originating from social media. Despite the growing interest in this topic, the research problem related to detecting and visualizing various stances, such as rudeness or uncertainty, has not been adequately addressed by the existing approaches. The challenges associated with this problem include the development of the underlying computational methods and visualization of the corresponding multi-label stance classification results. In this paper, we describe our work on a visual analytics platform, called StanceVis Prime, which has been designed for the analysis of sentiment and stance in temporal text data from various social media data sources. The use case scenarios intended for StanceVis Prime include social media monitoring and research in sociolinguistics. The design was motivated by the requirements of collaborating domain experts in linguistics as part of a larger research project on stance analysis. Our approach involves consuming documents from several text stream sources and applying sentiment and stance classification, resulting in multiple data series associated with source texts. StanceVis Prime provides the end users with an overview of similarities between the data series based on dynamic time warping analysis, as well as detailed visualizations of data series values. Users can also retrieve and conduct both distant and close reading of the documents corresponding to the data series. We demonstrate our approach with case studies involving political targets of interest and several social media data sources and report preliminary user feedback received from a domain expert.
Từ khóa
Tài liệu tham khảo
Aigner W, Miksch S, Schumann H, Tominski C (2011) Visualization of time-oriented data. Springer, Berlin. https://doi.org/10.1007/978-0-85729-079-3
Alencar AB, Börner K, Paulovich FV, de Oliveira MCF (2012) Time-aware visualization of document collections. In: Proceedings of the 27th annual ACM symposium on applied computing, ACM, SAC’12, pp 997–1004. https://doi.org/10.1145/2245276.2245469
Alspaugh S, Zokaei N, Liu A, Jin C, Hearst MA (2019) Futzing and moseying: interviews with professional data analysts on exploration practices. IEEE Trans Vis Comput Graphics 25(1):22–31. https://doi.org/10.1109/TVCG.2018.2865040
Bernard J, Wilhelm N, Scherer M, May T, Schreck T (2012) TimeSeriesPaths: projection-based explorative analysis of multivariate time series data. J WSCG 20(2):97–106
Berndt DJ, Clifford J (1994) Using dynamic time warping to find patterns in time series. In: Proceedings of the AAAI workshop on knowledge discovery in databases, AAAI Press, KDD’94, pp 359–370
Biber D, Finegan E (1989) Styles of stance in English: lexical and grammatical marking of evidentiality and affect. Interdiscip J Study Discourse 9(1):93–124. https://doi.org/10.1515/text.1.1989.9.1.93
Borg I, Groenen PJF (2005) Modern multidimensional scaling: theory and applications. Springer, Berlin. https://doi.org/10.1007/0-387-28981-X
Brewer C, Harrower M, The Pennsylvania State University (2009) ColorBrewer 2.0—color advice for cartography. http://colorbrewer2.org/. Accessed 28 July 2020
Byron L, Wattenberg M (2008) Stacked graphs: geometry & aesthetics. IEEE Trans Vis Comput Graphics 14(6):1245–1252. https://doi.org/10.1109/TVCG.2008.166
Cao N, Lin YR, Sun X, Lazer D, Liu S, Qu H (2012) Whisper: tracing the spatiotemporal process of information diffusion in real time. IEEE Trans Vis Comput Graphics 18(12):2649–2658. https://doi.org/10.1109/TVCG.2012.291
Cao N, Lu L, Lin YR, Wang F, Wen Z (2015) SocialHelix: visual analysis of sentiment divergence in social media. J Vis 18(2):221–235. https://doi.org/10.1007/s12650-014-0246-x
Cao N, Shi C, Lin S, Lu J, Lin YR, Lin CY (2016) TargetVue: visual analysis of anomalous user behaviors in online communication systems. IEEE Trans Vis Comput Graphics 22(1):280–289. https://doi.org/10.1109/TVCG.2015.2467196
Chatzimparmpas A, Martins RM, Jusufi I, Kucher K, Rossi F, Kerren A (2020) The state of the art in enhancing trust in machine learning models with the use of visualizations. Comput Graphics Forum 39(3):713–756. https://doi.org/10.1111/cgf.14034
Chen WF, Ku LW (2016) UTCNN: a deep learning model of stance classification on social media text. In: Proceedings of the 26th international conference on computational linguistics—technical papers, ACL, COLING 2016, pp 1635–1645
Chen S, Lin L, Yuan X (2017) Social media visual analytics. Comput Graphics Forum 36(3):563–587. https://doi.org/10.1111/cgf.13211
Chen S, Li J, Andrienko G, Andrienko N, Wang Y, Nguyen PH, Turkay C (2018) Supporting story synthesis: bridging the gap between visual analytics and storytelling. IEEE Trans Vis Comput Graphics. https://doi.org/10.1109/TVCG.2018.2889054
Crnovrsanin T, Muelder C, Correa C, Ma KL (2009) Proximity-based visualization of movement trace data. In: Proceedings of the IEEE symposium on visual analytics science and technology, VAST’09, pp 11–18. https://doi.org/10.1109/VAST.2009.5332593
Cuenca E, Sallaberry A, Wang FY, Poncelet P (2018) MultiStream: a multiresolution streamgraph approach to explore hierarchical time series. IEEE Trans Vis Comput Graphics 24(12):3160–3173. https://doi.org/10.1109/TVCG.2018.2796591
Cui W, Liu S, Tan L, Shi C, Song Y, Gao Z, Qu H, Tong X (2011) TextFlow: towards better understanding of evolving topics in text. IEEE Trans Vis Comput Graphics 17(12):2412–2421. https://doi.org/10.1109/TVCG.2011.239
Diakopoulos N, Zhang AX, Elgesem D, Salway A (2014) Identifying and analyzing moral evaluation frames in climate change blog discourse. In: Proceedings of the eighth international AAAI conference on weblogs and social media, AAAI, ICWSM’14, pp 583–586
Dörk M, Gruen D, Williamson C, Carpendale S (2010) A visual backchannel for large-scale events. IEEE Trans Vis Comput Graphics 16(6):1129–1138. https://doi.org/10.1109/TVCG.2010.129
Dou W, Liu S (2016) Topic- and time-oriented visual text analysis. IEEE Comput Graphics Appl 36(4):8–13. https://doi.org/10.1109/MCG.2016.73
El-Assady M, Gold V, Acevedo C, Collins C, Keim DA (2016) ConToVi: multi-party conversation exploration using topic-space views. Comput Graphics Forum 35(3):431–440. https://doi.org/10.1111/cgf.12919
El-Assady M, Sevastjanova R, Keim D, Collins C (2018) ThreadReconstructor: modeling reply-chains to untangle conversational text through visual analytics. Comput Graphics Forum 37(3):351–365. https://doi.org/10.1111/cgf.13425
Englebretson R (ed) (2007) Stancetaking in discourse: subjectivity, evaluation, interaction, pragmatics & beyond new series, vol 164. John Benjamins, Amsterdam. https://doi.org/10.1075/pbns.164
Esling P, Agon C (2012) Time-series data mining. ACM Comput Surv 45(1):12:1–12:34. https://doi.org/10.1145/2379776.2379788
Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second international conference on knowledge discovery and data mining, AAAI Press, KDD’96, pp 226–231
Felix C, Franconeri S, Bertini E (2018) Taking word clouds apart: an empirical investigation of the design space for keyword summaries. IEEE Trans Vis Comput Graphics 24(1):657–666. https://doi.org/10.1109/TVCG.2017.2746018
Glynn D, Sjölin M (eds) (2014) Subjectivity and epistemicity: corpus, discourse, and literary approaches to stance. Lund studies in English. Lund University Press, Lund
Havre S, Hetzler B, Nowell L (2000) ThemeRiver: visualizing theme changes over time. In: Proceedings of the IEEE symposium on information visualization, IEEE, InfoVis’00, pp 115–123. https://doi.org/10.1109/INFVIS.2000.885098
Hosmer DW Jr, Lemeshow S, Sturdivant RX (2013) Applied logistic regression. Wiley, Hoboken. https://doi.org/10.1002/9781118548387
Hutto C, Gilbert E (2014) VADER: a parsimonious rule-based model for sentiment analysis of social media text. In: Proceedings of the eighth international AAAI conference on weblogs and social media, AAAI, ICWSM’14
Jäckle D, Fischer F, Schreck T, Keim DA (2016) Temporal MDS plots for analysis of multivariate data. IEEE Trans Vis Comput Graphics 22(1):141–150. https://doi.org/10.1109/TVCG.2015.2467553
Jänicke S, Franzini G, Cheema MF, Scheuermann G (2015) On close and distant reading in digital humanities: a survey and future challenges. In: Proceedings of the EG/VGTC conference on visualization—STARs, The Eurographics Association, EuroVis’15. https://doi.org/10.2312/eurovisstar.20151113
Krzanowski WJ (2000) Principles of multivariate analysis. Oxford statistical science series. Oxford University Press, Oxford
Kucher K, Kerren A (2015) Text visualization techniques: taxonomy, visual survey, and community insights. In: Proceedings of the 8th IEEE Pacific visualization symposium, IEEE, PacificVis’15, pp 117–121. https://doi.org/10.1109/PACIFICVIS.2015.7156366
Kucher K, Schamp-Bjerede T, Kerren A, Paradis C, Sahlgren M (2016) Visual analysis of online social media to open up the investigation of stance phenomena. Inf Vis 15(2):93–116. https://doi.org/10.1177/1473871615575079
Kucher K, Paradis C, Sahlgren M, Kerren A (2017) Active learning and visual analytics for stance classification with ALVA. ACM Trans Interact Intell Syst 7(3):141–1431. https://doi.org/10.1145/3132169
Kucher K, Paradis C, Kerren A (2018a) The state of the art in sentiment visualization. Comput Graphics Forum 37(1):71–96. https://doi.org/10.1111/cgf.13217
Kucher K, Paradis C, Kerren A (2018b) Visual analysis of sentiment and stance in social media texts. In: Poster abstracts of the EG/VGTC conference on visualization, The Eurographics Association, EuroVis’18, pp 49–51. https://doi.org/10.2312/eurp.20181127
Lai S, Xu L, Liu K, Zhao J (2015) Recurrent convolutional neural networks for text classification. In: Bonet B, Koenig S (eds) Proceedings of the twenty-ninth AAAI conference on artificial intelligence, AAAI, AAAI’15
Liu S, Wu Y, Wei E, Liu M, Liu Y (2013) StoryFlow: tracking the evolution of stories. IEEE Trans Vis Comput Graphics 19(12):2436–2445. https://doi.org/10.1109/TVCG.2013.196
Liu J, Chang WC, Wu Y, Yang Y (2017) Deep learning for extreme multi-label text classification. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, ACM, SIGIR’17, pp 115–124. https://doi.org/10.1145/3077136.3080834
Liu S, Wang X, Collins C, Dou W, Ouyang F, El-Assady M, Jiang L, Keim DA (2019) Bridging text visualization and mining: a task-driven survey. IEEE Trans Vis Comput Graphics 25(7):2482–2504. https://doi.org/10.1109/TVCG.2018.2834341
Lu Y, Garcia R, Hansen B, Gleicher M, Maciejewski R (2017) The state-of-the-art in predictive visual analytics. Comput Graphics Forum 36(3):539–562. https://doi.org/10.1111/cgf.13210
Lu Y, Wang H, Landis S, Maciejewski R (2018) A visual analytics framework for identifying topic drivers in media events. IEEE Trans Vis Comput Graphics 24(9):2501–2515. https://doi.org/10.1109/TVCG.2017.2752166
Manning CD, Schütze H (1999) Foundations of statistical natural language processing. MIT Press, Cambridge
Martins RM, Kerren A (2018) Efficient dynamic time warping for big data streams. In: Proceedings of the 3rd workshop on real-time & stream analytics in big data & stream data management at IEEE Big Data’18, pp 2924–2929. https://doi.org/10.1109/BigData.2018.8621878
Martins RM, Simaki V, Kucher K, Paradis C, Kerren A (2017) StanceXplore: visualization for the interactive exploration of stance in social media. In: Proceedings of the 2nd workshop on visualization for the digital humanities, VIS4DH’17
Mohammad SM (2016) Sentiment analysis: detecting valence, emotions, and other affectual states from text. In: Meiselman HL (ed) Emotion measurement. Woodhead Publishing, Sawston, pp 201–237. https://doi.org/10.1016/B978-0-08-100508-8.00009-6
Mohammad SM, Kiritchenko S, Sobhani P, Zhu X, Cherry C (2016) SemEval-2016 task 6: detecting stance in tweets. In: Proceedings of the international workshop on semantic evaluation, SemEval’16
Mohammad SM, Sobhani P, Kiritchenko S (2017) Stance and sentiment in tweets. ACM Trans Internet Technol 17(3):26:1–26:23. https://doi.org/10.1145/3003433
Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2):1–135. https://doi.org/10.1561/1500000011
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay Ë (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
Pirolli P, Card S (2005) The sensemaking process and leverage points for analyst technology as identified through cognitive task analysis. In: Proceedings of the international conference on intelligence analysis, vol 5
Rauber PE, Falcão AX, Telea AC (2016) Visualizing time-dependent data using dynamic t-SNE. In: Short papers of the EG/VGTC conference on visualization, The Eurographics Association, EuroVis’16. https://doi.org/10.2312/eurovisshort.20161164
Roberts JC (2007) State of the art: coordinated & multiple views in exploratory visualization. In: Proceedings of the fifth international conference on coordinated and multiple views in exploratory visualization, IEEE, CMV’07, pp 61–71. https://doi.org/10.1109/CMV.2007.20
Russell DM (2016) Simple is good: Observations of visualization use amongst the Big Data digerati. In: Proceedings of the international working conference on advanced visual interfaces, ACM, AVI’16, pp 7–12. https://doi.org/10.1145/2909132.2933287
Sacha D, Stoffel A, Stoffel F, Kwon BC, Ellis G, Keim DA (2014) Knowledge generation model for visual analytics. IEEE Trans Vis Comput Graphics 20(12):1604–1613. https://doi.org/10.1109/TVCG.2014.2346481
Sagi O, Rokach L (2018) Ensemble learning: a survey. WIREs Data Min Knowl Discov 8(4):e1249. https://doi.org/10.1002/widm.1249
Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24(5):513–523. https://doi.org/10.1016/0306-4573(88)90021-0
Shi C, Cui W, Liu S, Xu P, Chen W, Qu H (2012) RankExplorer: visualization of ranking changes in large time series data. IEEE Trans Vis Comput Graphics 18(12):2669–2678. https://doi.org/10.1109/TVCG.2012.253
Shrestha A, Miller B, Zhu Y, Zhao Y (2013) Storygraph: extracting patterns from spatio-temporal data. In: Proceedings of the ACM SIGKDD workshop on interactive data exploration and analytics, ACM, IDEA’13, pp 95–103. https://doi.org/10.1145/2501511.2501525
Shutterstock Images, LLC (2011) Rickshaw: a JavaScript toolkit for creating interactive time-series graphs. https://github.com/shutterstock/rickshaw. Accessed 28 July 2020
Silvia S, Etemadpour R, Abbas J, Huskey S, Weaver C (2016) Visualizing variation in classical text with force directed storylines. In: Proceedings of the 1st workshop on visualization for the digital humanities, VIS4DH’16
Simaki V, Paradis C, Skeppstedt M, Sahlgren M, Kucher K, Kerren A (2017) Annotating speaker stance in discourse: the Brexit blog corpus. Corpus Linguist Linguist Theory. https://doi.org/10.1515/cllt-2016-0060
Skeppstedt M, Paradis C, Kerren A (2016a) PAL, a tool for pre-annotation and active learning. J Lang Technol Comput Linguist 31(1):91–110
Skeppstedt M, Sahlgren M, Paradis C, Kerren A (2016b) Active learning for detection of stance components. In: Proceedings of the workshop on computational modeling of people’s opinions, personality, and emotions in social media at COLING’16, ACL, PEOPLES’16, pp 50–59
Skeppstedt M, Simaki V, Paradis C, Kerren A (2017) Detection of stance and sentiment modifiers in political blogs. In: Proceedings of the international conference on speech and computer. Springer, SPECOM’17, pp 302–311. https://doi.org/10.1007/978-3-319-66429-3_29
Tanahashi Y, Ma KL (2012) Design considerations for optimizing storyline visualizations. IEEE Trans Vis Comput Graphics 18(12):2679–2688. https://doi.org/10.1109/TVCG.2012.212
Tory M, Möller T (2005) Evaluating visualizations: do expert reviews work? IEEE Comput Graphics Appl 25(5):8–11. https://doi.org/10.1109/MCG.2005.102
Tufte ER (2006) Beautiful evidence. Graphics Press, Cheshire
Tukey JW (1977) Exploratory data analysis. Addison-Wesley Publishing Company, Boston
Wall E, Agnihotri M, Matzen L, Divis K, Haass M, Endert A, Stasko J (2019) A heuristic approach to value-driven evaluation of visualizations. IEEE Trans Vis Comput Graphics 25(1):491–500. https://doi.org/10.1109/TVCG.2018.2865146
Wang X, Liu S, Chen Y, Peng TQ, Su J, Yang J, Guo B (2016a) How ideas flow across multiple social groups. In: Proceedings of the IEEE conference on visual analytics science and technology, IEEE, VAST’16, pp 51–60. https://doi.org/10.1109/VAST.2016.7883511
Wang X, Liu S, Liu J, Chen J, Zhu J, Guo B (2016b) TopicPanorama: a full picture of relevant topics. IEEE Trans Vis Comput Graphics 22(12):2508–2521. https://doi.org/10.1109/TVCG.2016.2515592
Wu Y, Wei F, Liu S, Au N, Cui W, Zhou H, Qu H (2010) OpinionSeer: interactive visualization of hotel customer feedback. IEEE Trans Vis Comput Graphics 16(6):1109–1118. https://doi.org/10.1109/TVCG.2010.183
Wu Y, Liu S, Yan K, Liu M, Wu F (2014) OpinionFlow: visual analysis of opinion diffusion on social media. IEEE Trans Vis Comput Graphics 20(12):1763–1772. https://doi.org/10.1109/TVCG.2014.2346920
Zhang L, Wang S, Liu B (2018) Deep learning for sentiment analysis: a survey. Wiley Interdiscip Rev Data Min Knowl Discov 8(4):e1253. https://doi.org/10.1002/widm.1253