Sentiment strength detection in short informal text
Tóm tắt
A huge number of informal messages are posted every day in social network sites, blogs, and discussion forums. Emotions seem to be frequently important in these texts for expressing friendship, showing social support or as part of online arguments. Algorithms to identify sentiment and sentiment strength are needed to help understand the role of emotion in this informal communication and also to identify inappropriate or anomalous affective utterances, potentially associated with threatening behavior to the self or others. Nevertheless, existing sentiment detection algorithms tend to be commercially oriented, designed to identify opinions about products rather than user behaviors. This article partly fills this gap with a new algorithm, SentiStrength, to extract sentiment strength from informal English text, using new methods to exploit the de facto grammars and spelling styles of cyberspace. Applied to MySpace comments and with a lookup table of term sentiment strengths optimized by machine learning, SentiStrength is able to predict positive emotion with 60.6% accuracy and negative emotion with 72.8% accuracy, both based upon strength scales of 1–5. The former, but not the latter, is better than baseline and a wide range of general machine learning approaches.
Từ khóa
Tài liệu tham khảo
Agerri R. &García‐Serrano A.(2010 May). Q‐WordNet: Extracting polarity from WordNet senses. Paper presented at the Seventh Conference on International Language Resources and Evaluation Malta. Retrieved May 25 2010 fromhttp://www.lrec‐conf.org/proceedings/lrec2010/pdf/2695_Paper.pdf
Baccianella S. Esuli A. &Sebastiani F.(2010 May). SentiWordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. Paper presented at the Seventh conference on International Language Resources and Evaluation Malta. Retrieved May 25 2010 from:http://www.lrec‐conf.org/proceedings/lrec2010/pdf/2769_Paper.pdf.
Balahur A. Steinberger R. Kabadjov M. Zavarella V. Goot E.V.D. Halkia M. et al. (2010 May). Sentiment analysis in the news. Paper presented at the Seventh Conference on International Language Resources and Evaluation. Retrieved May 25 2010 fromhttp://www.lrec‐conf.org/proceedings/lrec2010/pdf/2909_Paper.pdf
Baron N.S., 2003, The Stanford handbook for language engineers, 59
boyd d.(2008).Taken out of context: American teen sociality in networked publics. Unpublished doctoral dissertation University of California Berkeley.
boyd d., 2008, Youth, identity, and digital media, 119
Cornelius R.R., 1996, The science of emotion
Das S. &Chen M.(2001 July). Yahoo! for Amazon: Extracting market sentiment from stock message boards. Paper presented at the Asia Pacific Finance Association Annual Conference (APFA) Bangkok Thailand. Retrieved July 17 2009 fromhttp://sentiment.technicalanalysis.org.uk/DaCh.pdf
Esuli A. &Sebastiani F.(2006) SENTIWORDNET: A publicly available lexical resource for opinion mining. In Proceedings of Language Resources and Evaluation (LREC) 2006. Paris: European Language Resources Association. Retrieved July 28 2009 fromhttp://tcc.fbk.eu/projects/ontotext/Publications/LREC2006‐esuli‐sebastiani.pdf.
Fullwood C., 2007, Emoticons and impression formation, The Visual in Popular Culture, 19, 4
Gamon M., 2004, Proceedings of the 20th International Conference on Computational Linguistics (Article No. 841)
Gill A.J., 2008, Proceeding of the 26th Annual SIGCHI Conference on Human Factors in Computing Systems, 1121, 10.1145/1357054.1357229
Grinter R.E., 2003, Wan2tlk? Everyday text messaging. In Proceedings of Computer‐Human Interaction Conference (CHI 2003), 441
Kaji N. &Kitsuregawa M.(2007) Building lexicon for sentiment analysis from massive collection of HTML documents. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (pp. 1075–1083). College Park MD: Association for Computational Linguistics. Retrieved July 28 2010 fromhttp://www.aclweb.org/anthology/D/D1007/D1007‐1115.pdf
Krippendorff K., 2004, Content analysis: An introduction to its methodology
Mishne G.(2005 August). Experiments with mood classification in Blog posts. Paper presented at the First Workshop for Stylistic Analysis Of Text For Information Access (Style 2005) at SIGIR 2005 Salvador Brazil. Retrieved August 3 2010 fromhttp://staff.science.uva.nl/gilad/pubs/style2005‐blogmoods.pdf
Mishne G., 2006, Proceedings of the AAAI Spring Symposium on Computational Approaches to Analysing Weblogs (AAAI‐CAAW), 145
Pang B., 2005, Proceedings of the 43rd Annual Meeting of the ACL (pp. 115–124)
Riloff E. &Wiebe J.(2003) Learning extraction patterns for subjective expressions. In Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing (EMNLP‐03). College Park MD: Association for Computational Linguistics. Retrieved April 11 2010 fromhttp://www.cs.utah.edu/∼riloff/pdfs/emnlp2003.pdf
Snyder B., 2007, Proceedings of NAACL HLT, 300
Stone P.J., 1966, The general inquirer: A computer approach to content analysis
Strapparava C., 2004, Proceedings of the Fourth International Conference on Language Resources and Evaluation, 1083
Thurlow C.(2003) Generation Txt? The sociolinguistics of young people's text‐messaging. Discourse Analysis Online 1(1). Retrieved January 3 2008 fromhttp://extra.shu.ac.uk/daol/articles/v2001/n2001/a2003/thurlow2002003‐paper.html
Turney P.D., 2002, Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 417
Walther J., 2002, The handbook of interpersonal communication, 529
Wilson T.(2008).Fine‐grained subjectivity and sentiment analysis: Recognizing the intensity polarity and attitudes of private states. Unpublished manuscript University of Pittsburgh PA.
Witten I.H., 2005, Data mining: Practical machine learning tools and techniques