Sentiment strength detection in short informal text

Wiley - Tập 61 Số 12 - Trang 2544-2558 - 2010
Mike Thelwall1, Kevan Buckley1, George Paltoglou1, Di Cai1, Arvid Kappas2
1Statistical Cybermetrics Research Group, School of Computing and Information Technology, University of Wolverhampton, Wulfruna Street, Wolverhampton WV1 1SB, UK
2School of Humanities and Social Sciences, Jacobs University Bremen, Campus Ring 1, 28759 Bremen, Germany

Tóm tắt

Abstract

A huge number of informal messages are posted every day in social network sites, blogs, and discussion forums. Emotions seem to be frequently important in these texts for expressing friendship, showing social support or as part of online arguments. Algorithms to identify sentiment and sentiment strength are needed to help understand the role of emotion in this informal communication and also to identify inappropriate or anomalous affective utterances, potentially associated with threatening behavior to the self or others. Nevertheless, existing sentiment detection algorithms tend to be commercially oriented, designed to identify opinions about products rather than user behaviors. This article partly fills this gap with a new algorithm, SentiStrength, to extract sentiment strength from informal English text, using new methods to exploit the de facto grammars and spelling styles of cyberspace. Applied to MySpace comments and with a lookup table of term sentiment strengths optimized by machine learning, SentiStrength is able to predict positive emotion with 60.6% accuracy and negative emotion with 72.8% accuracy, both based upon strength scales of 1–5. The former, but not the latter, is better than baseline and a wide range of general machine learning approaches.

Từ khóa


Tài liệu tham khảo

10.1145/1361684.1361685

10.1109/TKDE.2008.51

Agerri R. &García‐Serrano A.(2010 May). Q‐WordNet: Extracting polarity from WordNet senses. Paper presented at the Seventh Conference on International Language Resources and Evaluation Malta. Retrieved May 25 2010 fromhttp://www.lrec‐conf.org/proceedings/lrec2010/pdf/2695_Paper.pdf

10.1002/asi.20553

10.1162/coli.07-034-R2

Baccianella S. Esuli A. &Sebastiani F.(2010 May). SentiWordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. Paper presented at the Seventh conference on International Language Resources and Evaluation Malta. Retrieved May 25 2010 from:http://www.lrec‐conf.org/proceedings/lrec2010/pdf/2769_Paper.pdf.

10.1007/978-3-642-00382-0_38

Balahur A. Steinberger R. Kabadjov M. Zavarella V. Goot E.V.D. Halkia M. et al. (2010 May). Sentiment analysis in the news. Paper presented at the Seventh Conference on International Language Resources and Evaluation. Retrieved May 25 2010 fromhttp://www.lrec‐conf.org/proceedings/lrec2010/pdf/2909_Paper.pdf

Baron N.S., 2003, The Stanford handbook for language engineers, 59

10.1016/j.jrp.2005.08.006

boyd d.(2008).Taken out of context: American teen sociality in networked publics. Unpublished doctoral dissertation University of California Berkeley.

boyd d., 2008, Youth, identity, and digital media, 119

10.3115/974499.974526

10.3115/1621474.1621568

10.3115/1613715.1613816

10.1016/B978-1-55860-377-6.50023-2

Cornelius R.R., 1996, The science of emotion

10.1017/CBO9780511487002

Das S. &Chen M.(2001 July). Yahoo! for Amazon: Extracting market sentiment from stock message boards. Paper presented at the Asia Pacific Finance Association Annual Conference (APFA) Bangkok Thailand. Retrieved July 17 2009 fromhttp://sentiment.technicalanalysis.org.uk/DaCh.pdf

10.1016/j.ins.2009.01.025

10.1177/0894439307311611

10.1016/j.chb.2007.04.004

10.1037/0022-3514.47.5.1105

10.1080/02699939208411068

Esuli A. &Sebastiani F.(2006) SENTIWORDNET: A publicly available lexical resource for opinion mining. In Proceedings of Language Resources and Evaluation (LREC) 2006. Paris: European Language Resources Association. Retrieved July 28 2009 fromhttp://tcc.fbk.eu/projects/ontotext/Publications/LREC2006‐esuli‐sebastiani.pdf.

10.1007/978-1-137-07946-6

Fullwood C., 2007, Emoticons and impression formation, The Visual in Popular Culture, 19, 4

Gamon M., 2004, Proceedings of the 20th International Conference on Computational Linguistics (Article No. 841)

10.1007/11552253_12

Gill A.J., 2008, Proceeding of the 26th Annual SIGCHI Conference on Human Factors in Computing Systems, 1121, 10.1145/1357054.1357229

Grinter R.E., 2003, Wan2tlk? Everyday text messaging. In Proceedings of Computer‐Human Interaction Conference (CHI 2003), 441

10.1145/1460563.1460611

10.1111/j.1540-5907.2009.00428.x

10.1109/ISM.Workshops.2007.92

10.1348/135910703762879246

Kaji N. &Kitsuregawa M.(2007) Building lexicon for sentiment analysis from massive collection of HTML documents. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (pp. 1075–1083). College Park MD: Association for Computational Linguistics. Retrieved July 28 2010 fromhttp://www.aclweb.org/anthology/D/D1007/D1007‐1115.pdf

Krippendorff K., 2004, Content analysis: An introduction to its methodology

10.1145/146370.146380

10.1145/604045.604067

10.1080/02699930802204677

Mishne G.(2005 August). Experiments with mood classification in Blog posts. Paper presented at the First Workshop for Stylistic Analysis Of Text For Information Access (Style 2005) at SIGIR 2005 Salvador Brazil. Retrieved August 3 2010 fromhttp://staff.science.uva.nl/gilad/pubs/style2005‐blogmoods.pdf

Mishne G., 2006, Proceedings of the AAAI Spring Symposium on Computational Approaches to Analysing Weblogs (AAAI‐CAAW), 145

10.1007/s10606-004-8127-9

10.1007/978-3-540-74889-2_20

10.3115/1273073.1273152

10.3115/1218955.1218990

Pang B., 2005, Proceedings of the 43rd Annual Meeting of the ACL (pp. 115–124)

10.1561/1500000011

10.1146/annurev.psych.54.101601.145041

10.1037/0022-3514.72.4.863

10.1145/358027.358048

10.1016/j.joi.2009.01.003

10.3115/1628960.1628969

10.3115/1610075.1610137

Riloff E. &Wiebe J.(2003) Learning extraction patterns for subjective expressions. In Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing (EMNLP‐03). College Park MD: Association for Computational Linguistics. Retrieved April 11 2010 fromhttp://www.cs.utah.edu/∼riloff/pdfs/emnlp2003.pdf

10.1037/0022-3514.37.3.345

10.1023/A:1007649029923

10.1177/1094428107304534

Snyder B., 2007, Proceedings of NAACL HLT, 300

Stone P.J., 1966, The general inquirer: A computer approach to content analysis

10.1177/0146167293192002

10.1145/1363686.1364052

Strapparava C., 2004, Proceedings of the Fourth International Conference on Language Resources and Evaluation, 1083

10.1016/j.eswa.2009.02.063

10.1108/14684520910944391

10.1002/asi.21180

Thurlow C.(2003) Generation Txt? The sociolinguistics of young people's text‐messaging. Discourse Analysis Online 1(1). Retrieved January 3 2008 fromhttp://extra.shu.ac.uk/daol/articles/v2001/n2001/a2003/thurlow2002003‐paper.html

Turney P.D., 2002, Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 417

Walther J., 2002, The handbook of interpersonal communication, 529

10.1037/0022-3514.54.6.1020

10.1037/0022-3514.54.6.1063

10.1162/0891201041850885

10.1007/s10579-005-7880-9

Wilson T.(2008).Fine‐grained subjectivity and sentiment analysis: Recognizing the intensity polarity and attitudes of private states. Unpublished manuscript University of Pittsburgh PA.

10.1162/coli.08-012-R1-06-90

10.1111/j.1467-8640.2006.00275.x

Witten I.H., 2005, Data mining: Practical machine learning tools and techniques

10.1145/1165255.1165259