Lexicon-based sentiment analysis: Comparative evaluation of six sentiment lexicons

Journal of Information Science - Tập 44 Số 4 - Trang 491-511 - 2018
Christopher S. G. Khoo1, Sathik Basha Johnkhan1
1Wee Kim Wee School of Communication and Information, Nanyang Technological University, Singapore

Tóm tắt

This article introduces a new general-purpose sentiment lexicon called WKWSCI Sentiment Lexicon and compares it with five existing lexicons: Hu & Liu Opinion Lexicon, Multi-perspective Question Answering (MPQA) Subjectivity Lexicon, General Inquirer, National Research Council Canada (NRC) Word-Sentiment Association Lexicon and Semantic Orientation Calculator (SO-CAL) lexicon. The effectiveness of the sentiment lexicons for sentiment categorisation at the document level and sentence level was evaluated using an Amazon product review data set and a news headlines data set. WKWSCI, MPQA, Hu & Liu and SO-CAL lexicons are equally good for product review sentiment categorisation, obtaining accuracy rates of 75%–77% when appropriate weights are used for different categories of sentiment words. However, when a training corpus is not available, Hu & Liu obtained the best accuracy with a simple-minded approach of counting positive and negative words for both document-level and sentence-level sentiment categorisation. The WKWSCI lexicon obtained the best accuracy of 69% on the news headlines sentiment categorisation task, and the sentiment strength values obtained a Pearson correlation of 0.57 with human-assigned sentiment values. It is recommended that the Hu & Liu lexicon be used for product review texts and the WKWSCI lexicon for non-review texts.

Từ khóa


Tài liệu tham khảo

10.1007/BF00994018

Vapnik VN, 1998, Statistical learning theory

Zhang H., Proceedings of the seventeenth Florida artificial intelligence research society conference, 562

Wang S, Proceedings of the 50th annual meeting of the association for computational linguistics, 90

Stone PJ, 1966, The general inquirer: a computer approach to content analysis

10.1162/COLI_a_00049

Hatzivassiloglou V, Proceedings of the 35th meeting of the association for computational linguistics, 174

10.1145/944012.944013

Esuli A, Proceedings of the 5th international conference on language resources and evaluation (LREC 2006), 417

Das A, Proceedings of the 3rd workshop on computational approaches to subjectivity and sentiment analysis, 38

10.1002/asi.22872

10.1177/0165551510388123

10.1007/s10579-005-7880-9

10.1108/14684521211287936

Strapparava C, Proceedings of the 4th international workshop on semantic evaluations, 70

12 dicts introduction, http://wordlist.aspell.net/12dicts-readme/

Khoo CSG, Proceedings of the 17th international conference on Asia-Pacific digital libraries, 82

10.1111/j.1467-8640.2012.00460.x

Hong Y, Kwak H, Baek Y, Tower of babel: a crowdsourcing game building sentiment lexicons for resource-scarce languages. In: Proceedings of the WWW 2013 companion, Rio de Janeiro, Brazil, 13–17 May 2013. New York: ACM.

Thisone CC, Proceedings of the 3rd workshop on the people’s web meets NLP, 1

Kamps J, Proceedings of the language resources and evaluation (LREC 2004), 1115

Church KW, 1990, Comput Linguist, 16, 22

Glavaš G, Proceedings of the workshop on innovative hybrid approaches to the processing of textual data, 1

Li F, Proceedings of the 50th annual meeting of the association for computational linguistics, 410

Qiu G, Proceedings of the 21st international joint conference on artificial intelligence, 9, 1199

Bahrainian SA, Proceedings of the 2014 IEEE/WIC/ACM international joint conferences on web intelligence (WI) and intelligent agent technologies (IAT), 1, 361

Yates A, Proceedings of the 76th ASIS&T Annual Meeting, 50, 1

Xu X, Proceedings of the 21st ACM international conference on information and knowledge management, 1895

Blei DM, 2003, J Mach Learn Res, 3, 993

10.1002/asi.21662

10.1007/s10579-013-9223-6

Thorndike EL, 1963, The teacher’s word book of 30,000 words, 4

Lasswell HD, 1968, The Lasswell value dictionary, 1

Origins of the General Inquirer marker categories, http://www.wjh.harvard.edu/~inquirer/kellystone2.htm

Riloff E, Proceedings of the 2003 conference on empirical methods in natural language processing, 105

Hu M, Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, 168

Bernard J, 1986, The Macquarie thesaurus

Brants T, 2006, Web 1T 5-gram version 1

Strapparava C, Proceedings of the 4th international conference on language resources and evaluation (LREC-2004), 1083

Pang B, Proceedings of the conference on empirical methods in NLP, 79

Jindal N, Proceedings of the 2008 international conference on web search and data mining, 219

Manning CD, Proceedings of the 52nd annual meeting of the association for computational linguistics: system demonstrations, 55

Bird S, 2009, Natural language processing with Python

10.1016/j.lcats.2005.04.007

Baccianella S, Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10), 2200