Classification analysis of Kouji Uno’s novels using topic model

Behaviormetrika - Tập 47 - Trang 189-212 - 2019
Xueqin Liu1, Mingzhe Jin2
1Graduate School of Culture and Information Science, Doshisha University, Kyotanabe, Japan
2Faculty of Culture and Information Science, Doshisha University, Kyotanabe, Japan

Tóm tắt

Kouji Uno is a prominent Japanese littérateur, whose creative activity was subjected to disruption twice. Literary critics take the view that Uno’s writing style underwent changes when he resumed writing. This paper aims at revealing the partition of Uno’s creative phase using statistical methods to conduct an investigation into the stylistic characteristics of his novels. For this purpose, a topic-model was applied to classifying Uno’s novels and to comparing the characteristics of each group. As revealed by the results, Uno’s novels can be classified into three groups separated approximately by the two non-productive periods and there are different stylistic characteristics displayed by novels in each group. Moreover, one interesting observation is that his stylistic characteristics have changed even prior to the interruptions caused to writing. It is more reasonable that Uno’s writing style started to change beforethe interruptions with achievements made to some extent after the resumption.

Tài liệu tham khảo

Blei DM (2012) Probabilistic topic models. Commun ACM 55(4):77–84 Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022 Brinegar CS (1963) Mark Twain and the Quintus Curtius Snodgrass letters: a statistical test of authorship. J Am Stat Assoc 58(301):85–96 Brody S, Lapata M (2009) Bayesian word sense induction. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2009), pp 103–11 Can F, Patton JM (2004) Change of writing style with time. Comput Humanit 38(1):61–82 Grieve J (2007) Quantitative authorship attribution: an evaluation of techniques. Lit Linguist Comput 22(3):251–270 Haruhara T, Kajitani T (1971) Gendai bungakusha no byouseki-sousaku to kyouki no nazo-, 74-84, Shinjuku shyobou, Tokyo Hennig L (2009) Topic-based multi-document summarization with probabilistic latent semantic analysis. In: Proceedings of the International Conference RANLP: 144–149 Hirotsu K (1998) Akutagawa ryuunosuke no jisatsu, Hirotsu kazuo -sakka no jiden 65-, 218-221, Nihontosho Center, Tokyo Hirst G, Feng WV (2012) Changes in style in authors with Alzheimer’s Disease. Engl Stud 93(3):357–370 Holmes DI, Robertson M, Paez R (2001) Stephen Crane and the New-York tribune: a case study in traditional and non-traditional authorship attribution. Comput Humanit 35(3):315–331 Hoover DL (2002) Frequent word sequences and statistical stylistic. Lit Linguist Comput 17(2):157–180 Ito Z, Murakami M (1991) A statistical study of Nichiren (1222–1282)’s literary style. Thought Relig Asia 8:27–35 Jin MZ (2002) Authorship attribution based on n-gram models in postpositional particle of Japanese. Math Linguist 23(5):225–240 Jin MZ (2009) Estimation of when the works were written: with the works of Ryunosuke Akutagawa as examples. Behaviormetrika 36(2):89–103 Jin MZ (2013) Authorship identification based on phrase patterns. Jpn J Behaviormetr 40(1):17–28 Jin MZ (2014) Using integrated classification algorithm to identify a text’s author. Jpn J Behaviormetr 41(1):35–46 Jin MZ, Murakami M (1993) Author’s features writing styles as seen through their features use of commas. Behaviormetrika 20(1):63–76 Jockers ML, Mimno D (2013) Significant themes in 19th-century literature. Poetics 41(6):750–769 Kabashima T (1955) Ruibetsu shita hinshi ni mirauru kisokusei. Kokugo kokubun 24(6):55–57 Li X, Lancashire L, Hirst G, Jokel R (2011) Longitudinal detection of dementia through lexical and syntactic changes in writing: a case study of three British novelists. Lit Linguist Comput 26(4):435–461 Louvigné S, Uto M, Kato Y, Ishii T (2018) Social constructivist approach of motivation: social media messages recommendation system. Behaviormetrika 45(1):133–155 Matsuura T, Kanada Y (2000) Identifying authors of sentences in Japanese modern Novels via distribution of n-grams. Math Linguist 22(6):225–238 Mendenhall TC (1887) The characteristic curves of composition. Science IX:237–249 Mizukami T (1979) Kouji Uno den, Chuoukouronshya, Tokyo Mosteller F, Wallace DL (1964) Inference and disputed authorship: the federalist. Addison-Wesley, Reading Murakami M, Imanishi Y (1999) On a quantitative analysis of auxiliary verbs used in genji monogatari. Inform Proc Soc Jpn 40(3):774–782 Navarro-Colorado B (2018) On poetic topic modeling: extracting themes and motifs from a corpus of Spanish poetry. Front Dig Humanit 5:15 (Computational linguistics and literature) O’Brien DP, Darnell AC (1982) Authorship puzzles in the history of economics: a statistical approach. Macmillan, Humanities Press, London O’Donnell B (1966) Stephen Crane’s The O’ Ruddy: a problem in authorship discrimination. In: Leed Jacob (ed) The computer and literary style, kent. Kent State University Press, Kent Schöch C (2017) Topic modeling genre: an exploration of french classical and enlightenment drama. Dig Humanit Q 11(2):266–285 Seroussi Y, Bohnert F, Zukerman I (2012) Authorship attribution with author aware topic models. In: Proceedings of the 50th annual meeting of the association for computational linguistics, vol 2, short papers, pp 264–269 Shinoda H (1972) Yumemiruheya no kouzu, Subaru (10):90–105, Shueisha Smith MWA (1983) Recent experience and new developments of methods for the determination of authorship. Assoc Lit Linguist Comput Bull 11:73–82 Sun H, Jin MZ (2018) Ghostwriter verification of Yasunari Kawabata’s novel hananikki. J Jpn Soc Inform Knowl 28(1):3–14 Titov I, McDonald R (2008) A joint model of text and aspect ratings for sentiment summarization. In: Proceedings of association for computational linguistics-08: HLT, pp 308–316 Tsujino H (1983) Uno Kouji shi no kingyou nitsuite, 127-135, Yuuseidou, Tokyo Uesaka A, Murakami M (2015) Verifying the authorship of Saikaku Ihara’s work in early modern Japanese literature; a quantitative approach. Dig Sch Humanit 30(4):599–607 Uto M, Louvigné S, Kato Y, Ishii T, Miyazawa Y (2017) Diverse reports recommendation system based on latent Dirichlet allocation. Behaviormetrika 44(2):425–444 Wei X, Croft WB (2006) LDA-based document models for ad-hoc retrieval. In: Proceedings of the 29th annual international ACM SIGIR conference: 178–185 Whissell C (1996) Traditional and emotional stylometric analysis of the songs of Beatles Paul McCartney and John Lennon. Comput Humanit 30:257–265 Yasumoto B (1958) The author of Uji jujyo: infer authorship attribution by sentence psychology. Jpn Psychol Rev 2(1):147–156 Yule GU (1938) On sentence-length as a statistical characteristic of style in prose, with application to two cases of disputed authorship. Biometrika 30(3/4):363–390 Yule GU (1944) The statistical study of literary vocabulary. Cambridge University Press, Cambridge Zaitsu W (2016) Text-mining to classify motives for single and serial arson in last 10 years. Jpn J Crim Psychol 53(2):29–41