Dữ liệu lớn gặp câu chuyện: sử dụng học máy để dự đoán fanfiction phổ biến

Duy Nguyen1, Stephen Zigmond1, Samuel Glassco1, Bach Tran1, Philippe J. Giabbanelli1
1Department of Computer Science & Software Engineering, Miami University, Oxford, USA

Tóm tắt

Fanfiction là một thể loại văn học phổ biến, trong đó các nhà văn tái sử dụng một vũ trụ, ví dụ như việc biến đổi các mối quan hệ dị tính thành các nhân vật queer hoặc đưa lãng mạn vào các chương trình tập trung vào kinh dị và phiêu lưu. Fanfiction đã là chủ đề của nhiều nghiên cứu trong khai thác văn bản và phân tích mạng, trong đó sử dụng các kỹ thuật Xử lý Ngôn ngữ Tự nhiên (NLP) để so sánh fanfiction với kịch bản gốc hoặc đưa ra các dự đoán khác nhau. Trong bài báo này, chúng tôi sử dụng NLP để dự đoán độ phổ biến của một câu chuyện và xem xét các đặc điểm nào đóng góp vào độ phổ biến đó. Nỗ lực này là quan trọng xét về việc sử dụng ngày càng nhiều các trợ lý AI và sự quan tâm liên tục đến việc tạo ra văn bản có các đặc điểm mong muốn. Chúng tôi đã sử dụng hai trang web chính để thu thập các câu chuyện fan (Fanfiction.net và Archives Of Our Own) về Supernatural, đã là chủ đề của nhiều công trình học thuật. Chúng tôi đã rút trích các đặc điểm cấp cao như nhân vật chính và cảm xúc từ 79.288 câu chuyện này và sử dụng các đặc điểm trong một phân loại nhị phân được hỗ trợ bởi các phương pháp dựa trên cây, phương pháp tập hợp (rừng ngẫu nhiên), mạng nơ-ron và Máy vector hỗ trợ. Các bộ phân loại tối ưu hóa của chúng tôi đã xác định chính xác các câu chuyện phổ biến trong bốn trên năm trường hợp. Bằng cách liên hệ các đặc điểm với các kết quả phân loại bằng cách sử dụng giá trị SHAP, chúng tôi phát hiện ra rằng người hâm mộ thích những câu chuyện dài hơn với từ vựng phong phú hơn, điều này có thể thông báo cho các gợi ý của chatbot AI để tiếp tục tạo ra những câu chuyện thành công như vậy. Tuy nhiên, chúng tôi cũng nhận thấy rằng người hâm mộ muốn những câu chuyện khác với tài liệu gốc (ví dụ: ưa thích lãng mạn và không thích khi các nhân vật bị tổn thương), do đó các câu chuyện do AI tạo ra có thể ít phổ biến hơn nếu chúng theo sát tài liệu gốc của một chương trình.

Từ khóa

#fanfiction #học máy #xử lý ngôn ngữ tự nhiên #phân tích văn bản #trí tuệ nhân tạo #độ phổ biến #người hâm mộ

Tài liệu tham khảo

Agarwal D, Vijay D, et al. (2021) Genre classification using character networks. In: 2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS). pp. 216–222. IEEE Aggarwal CC, Zhai C (2012) A survey of text classification algorithms, pp. 163–222. Springer Archive of Our Own: Ai and data scraping on the archive (May 2023), https://archiveofourown.org/admin_posts/25888, accessed 09/09/23 Archive of Our Own: Terms of service faq (2023), https://archiveofourown.org/tos_faq, accessed 09/09/23 Åström B (2010) ’let’s get those winchesters pregnant’: Male pregnancy in supernatural fan fiction. Transformative works and cultures 4(1) Barker M (October 2002) Slashing the slayer: a thematic analysis of homo-erotic buffy fan fiction. In: Blood, Text and Fears, http://oro.open.ac.uk/23340/ Birkhold MH (2019) Characters Before Copyright: The Rise and Regulation of Fan Fiction in Eighteenth-Century Germany. Oxford University Press Black R, Alexander J, Chen V, Duarte J (2019) Representations of autism in online harry potter fanfiction. J Lit Res 51(1):30–51 Black RW (2006) Language, culture, and identity in online fanfiction. E-learn Digit Media 3(2):170–184 Budiarto A, Chairunissa R, Fitriani A (2021) Motivation behind writing fanfictions for digital authors on wattpad and twitter. Alphabet: A Biannual Acad J Lang Lit Cultural Stud 4(1): 48–53 Campbell TW, Roder H, Georgantas RW III, Roder J (2022) Exact shapley values for local and model-true explanations of decision tree ensembles. Mach Learn Appl 9:100345 Carter L (1973) Imaginary Worlds. Ballantine Books, New York, USA Carter L (1976) Kingdoms of Sorcery: An Anthology of Adult Fantasy. Doubleday and Company, Garden City, New York, USA Cheng R, Frens J (2022) Feedback exchange and online affinity: A case study of online fanfiction writers. arXiv preprint arXiv:2209.12810 Church J (2023) # supercorp kissed.... or did they?: lesbian fandom and queerbaiting. J Lesbian Stud pp. 1–17 Crutzen R, Giabbanelli P (2014) Using classifiers to identify binge drinkers based on drinking motives. Substance Use Misuse 49(1–2):110–115 Damore M (2019) Supernatural’s creator is aware of (and flattered by) your erotic fanfic, https://www.cbr.com/supernatural-creator-aware-flattered-erotic-fanfic/ accessed 11/03/2024 Datlow E (ed) (2017) Mad Hatters and March Hares. Tor, New York, USA Davis R, Frens J, Sharma N, Muralikumar MD, Aragon C, Evans S (2021) Mentorship network structure: How relationships emerge online and what they mean for amateur creators. arXiv preprint arXiv:2106.14111 Dudzik W, Nalepa J, Kawulok M (2021) Evolving data-adaptive support vector machines for binary classification. Knowl Based Syst 227:107221 FanFiction: Terms of service (2019), https://www.fanfiction.net/tos/, Accessed 09/09/23 Fedotova A, Romanov A, Kurtukova A, Shelupanov A (2023) Digital authorship attribution in Russian-language fanfiction and classical literature. Algorithms 16(1):13 Fiesler C, Dym B (2020) Moving across lands: online platform migration in fandom communities. Proc ACM Human Comput Interact 4(CSCW1):1–25 Flegel, M., Roth, J.: Annihilating love and heterosexuality without women: Romance, generic difference, and queer politics in supernatural fan fiction. Transform Works Cult 4(0) (2010) Floegel D (2020) Write the story you want to read”: world-queering through slash fanfiction creation. J Document Frens J, Davis R, Lee J, Zhang D, Aragon C (2018) Reviews matter: how distributed mentoring predicts lexical diversity on fanfiction. net. arXiv preprint arXiv:1809.10268 Frith V (2015) ’supernatural’ season 11: Series creator has an opinion on fanfiction, eric kripke speaks out, https://www.enstarz.com/articles/129574/20151223/supernatural-season-11-series-creator-is-very-proud-of-fanfiction-eric-kripke-priases-spn-family-video.htm accessed 11/03/2024 Froelich N, Liu A, Shang R, Xiao Z, Neils T, Frens J, Aragon C (2021) Reciprocity in reviewing on fanfiction. net. In: HCI International 2021-Posters: 23rd HCI International Conference, HCII 2021, Virtual Event, July 24–29, 2021, Proceedings, Part III 23. pp. 39–44. Springer Galgoczy MC, Phatak A, Vinson D, Mago VK, Giabbanelli PJ (2022) (re) shaping online narratives: when bots promote the message of president trump during his first impeachment. PeerJ Comput Sci 8:e947 Gonçalves D (2015) Popping (it) up: an exploration on popular culture and tv series supernatural. Diffractions 4:1–24 Guirola CC (2023) “Fine, I’ll Write It Myself”: Rhetorical Practices of LGBTQIA+ Fandom Communities as Activism. Master’s thesis, California State University, Fresno Han S, Seo S, Kang M, Kim J, Choi N, Song M, Choi JD (2021) Fantasycoref: Coreference resolution on fantasy literature through omniscient writer’s point of view. In: Proceedings of the Fourth Workshop on Computational Models of Reference, Anaphora and Coreference. pp. 24–35 He P, Gao J, Chen W (2021) Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing. arXiv preprint arXiv:2111.09543 Heck DW, Seiling L, Bröder A (2020) The love of large numbers revisited: A coherence model of the popularity bias. Cognition 195:104069 Herbig A, Herrmann AF (2016) Polymediated narrative: the case of the supernatural episode" fan fiction". Int J Commun 10:18 Jenkins H (1992) Textual Poachers: Television Fans and Participatory Culture. Routledge Jing E, DeDeo S, Ahn YY (2019) Sameness attracts, novelty disturbs, but outliers flourish in fanfiction online. arXiv preprint arXiv:1904.07741 Kadhim AI (2019) Survey on supervised machine learning techniques for automatic text classification. Artif Intell Rev 52(1):273–292 Kalcheva N, Karova M, Penev I (2020) Comparison of the accuracy of svm kemel functions in text classification. In: 2020 International Conference on Biomedical Innovations and Applications (BIA). pp. 141–145. IEEE Kim E, Klinger R (2019) An analysis of emotion communication channels in fan fiction: towards emotional storytelling. arXiv preprint arXiv:1906.02402 Kleindienst, N., Schmidt, T.: Investigating the transformation of original work by the online fan fiction community: A case study for supernatural. In: Digital Practices. Reading, Writing and Evaluation on the Web (November 2020), https://epub.uni-regensburg.de/50828/ Koltochikhina, E., Tsepkova, A.: The status and pecularities of fanfiction as a phenomenon of contemporary popular culture. Urgent Problems of Modern Society: Language, Culture and Technology in the Changing World 61 (2020) Kowsari K, Jafari Meimandi K, Heidarysafa M, Mendu S, Barnes L, Brown D (2019) Text classification algorithms: a survey. Information 10(4):150 Labatut V, Bost X (2019) Extraction and analysis of fictional character networks: a survey. ACM Comput Surv (CSUR) 52(5):1–40 Lamerichs N (2018) The next wave in participatory culture: Mixing human and nonhuman entities in creative practices and fandom. The Future of Fandom (28) Leigh S (2020) Fan fiction as a valuable literacy practice. Transform Works Cult 34:1–4 Li J, Sterman S (2017) Archive of our own scraper. In: Stanfill, M., Li, J., Stenger, J., Armstrong, T., Sterman, S. (eds.) Digital Humanities Methods and Fan Studies, https://github.com/radiolarian/AO3Scraper Llewellyn A (2022) space where queer is normalized: The online world and fanfictions as heterotopias for wlw. J Homosexuality 69(13):2348–2369 Lu J (2016) Chinese historical fan fiction internet writers and internet literature. Pacific Coast Philol 51(2):159–176 Macklem L, Grace D (eds) (2020) Supernatural Out of the Box: Essays on the Metatextuality of the Series. McFarland & Company, Jefferson, North Carolina, USA McCloskey K, Ramírez-Esparza N, Johnson BT (2022) Strange new worlds: social content in popular star trek fanfiction versus commercial novels. Psychol Popular Media 11(2):152 McCullough H (2023) Archive of our own: https://archiveofourown.org Am J 40(1), 132–134 Michaud Wild N (2020) The active defense of fanfiction writing: Sherlock fans’ metatextual response. Eur J Cultural Stud 23(2):244–260 Milli, S., Bamman, D.: Beyond canonical texts: A computational analysis of fanfiction. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. pp. 2048–2053 (2016) Minaee S, Kalchbrenner N, Cambria E, Nikzad N, Chenaghlu M, Gao J (2021) Deep learning-based text classification: a comprehensive review. ACM Comput Surv (CSUR) 54(3):1–40 Myrick JA (2019) How supernatural fans kept the show alive for 15 seasons, https://fansided.com/2019/09/09/supernatural-fandom-15-seasons-finale/ Nohara Y, Matsumoto K, Soejima H, Nakashima N (2022) Explanation of machine learning models using shapley additive explanation and application for real data in hospital. Comput Meth Programs Biomed 214:106584 Okorafor, N.: The baptist (2017) Petersen-Reed KA (2019) Fanfiction as performative criticism: Harry potter racebending. J Creat Writ Stud 4(1):10 Pianzola F, Rebora S, Lauer G (2020) Wattpad as a resource for literary studies. quantitative and qualitative examples of the importance of digital social reading and readers’ comments in the margins. PloS one 15(1): e0226708 Pillutla VS, Tawfik AA, Giabbanelli PJ (2020) Detecting the depth and progression of learning in massive open online courses by mining discussion data. Technol Knowl Learn 25(4):881–898 Rodrigo H, Beukes EW, Andersson G, Manchaiah V (2021) Exploratory data mining techniques (decision tree models) for examining the impact of internet-based cognitive behavioral therapy for tinnitus: Machine learning approach. J Med Intern Res 23(11):e28999 Rosenberg A (2023) Custom ai chatbots are quietly becoming the next big thing in fandom. The Verge https://www.theverge.com/23627402/character-ai-fandom-chat-bots-fanfiction-role-playing Rosso N, Giabbanelli P et al (2018) Accurately inferring compliance to five major food guidelines through simplified surveys: applying data mining to the uk national diet and nutrition survey. JMIR Public Health Surveillance 4(2):e9536 Rowe, R., Henderson, T., Wang, T.: Text mining, hermione granger, and fan fiction: What’s in a name? Transformative Works and Cultures 36 (2021) Sandhu M, Vinson CD, Mago VK, Giabbanelli PJ (2019) From associations to sarcasm: mining the shift of opinions regarding the supreme court on twitter. Online Social Netw Media 14:100054 Santilli N (2010) Online publishing:(anime) fan fiction and identity. J Digit Res Publish 3(1):40–47 Sauro S, Sundmark B (2019) Critically examining the use of blog-based fanfiction in the advanced language classroom. ReCALL 31(1):40–55 Schmidt T, Hoffmann J, Wolff C (2022) Analyzing character networks in crossover fan fictions of archive of our own Sourati Hassan Zadeh Z, Sabri N, Chamani H, Bahrak B (2022) Quantitative analysis of fanfictions’ popularity. Social Netw Anal Mining 12(1):42 Steinwart I (2001) On the influence of the kernel on the consistency of support vector machines. J Mach Learn Res 2(Nov), 67–93 Stenger J (2021) The datafication of fandom, pp. 255–276. University of Iowa Press, Iowa City, Iowa, USA Suissa O, Elmalech A, Zhitomirsky-Geffet M (2022) Text analysis using deep neural networks in digital humanities and information science. J Assoc Inf Sci Technol 73(2):268–287 Taylor A, Nylander S (eds) (2019) Death in Supernatural: Critical Essays. McFarland & Company, Jefferson, North Carolina, USA Tosenberger C (2008) " the epic love story of sam and dean": supernatural, queer readings, and the romance of incestuous fan fiction. Transform Works Cultures 1 Vilares D, Gómez-Rodríguez C (2019) Harry potter and the action prediction challenge from natural language. arXiv preprint arXiv:1905.11037 Wainer J, Cawley G (2021) Nested cross-validation when selecting classifiers is overzealous for most practical applications. Expert Syst Appl 182:115222 Walls-Thumma DM (2019) Affirmational and transformational values and practices in the tolkien fanfiction community. J Tolkien Res 8(1):6 Wanda P, Jie H (2021) Deepfriend: finding abnormal nodes in online social networks using dynamic deep learning. Soc Netw Anal Mining 11(34) Wang CY (2019) Officially sanctioned adaptation and affective fan resistance: The transmedia convergence of the online drama guardian in china. Series Int J TV Serial Narrat 5(2):45–58 Wilkinson J (2013) The epic love story of supernatural and fanfic. In: Jamison A (ed.) Fic: Why Fanfiction Is Taking Over the World, pp. 309–315 Wolska M, Schröder C, Borchardt O, Stein B, Potthast M (2022) Trigger warnings: Bootstrapping a violence detector for fanfiction. arXiv preprint arXiv:2209.04409 Yang F (2022) An extraction and representation pipeline for literary characters. Proc AAAI Conf Artif Intell 36:13146–13147 Yin K, Aragon C, Evans S, Davis K (2017) Where no one has gone before: A meta-dataset of the world’s largest fanfiction repository. In: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. pp. 6106–6110 Yoder MM, Khosla S, Shen Q, Naik A, Jin H, Muralidharan H, Rosé CP (2021) Fanfictionnlp: A text processing pipeline for fanfiction. In: The 3rd Workshop on Narrative Understanding Zubernis LS (2021) The spnfamily: Supernatural and the fandom like no other. MONSTRUM 3