The defeat of the Winograd Schema Challenge

Artificial Intelligence - Tập 325 - Trang 103971 - 2023
Vid Kocijan1, Ernest Davis2, Thomas Lukasiewicz3,4, Gary Marcus5, Leora Morgenstern6
1Kumo.ai, 357 Castro Street, Suite 200 Mountain View, CA 94041, United States
2New York University, Department of Computer Science, 251 Mercer St, NY 10012, United States
3Institute of Logic and Computation, Vienna University of Technology, Austria
4Department of Computer Science, University of Oxford, UK
5New York University, New York, NY 10012, United States
6Palo Alto Research Center, part of SRI International, 3333 Coyote Hill Rd, Palo Alto, CA 94304, United States

Tài liệu tham khảo

Amsili, 2017, A Google-proof collection of French Winograd schemas Bakhtin, 2022, Human-level play in the game of diplomacy by combining language models with strategic reasoning, Science, 378, 1067, 10.1126/science.ade9097 D. Bender, Establishing a human baseline for the Winograd Schema Challenge, 2015. Bernard, 2020, Mandarinograd: a Chinese collection of Winograd schemas Brown, 2020, Language models are few-shot learners 2006 Charniak, 1972 Chinchor, 1998 Cozman, 2020, The Winograd schemas from hell, 531 Davis, 2013, Qualitative spatial reasoning in interpreting text and narrative, Spat. Cogn. Comput., 10.1080/13875868.2013.824976 Davis, 2021, Using human skills taxonomies and tests in as measures of artificial intelligence Davis Davis, 2015, Commonsense reasoning and commonsense knowledge in artificial intelligence, Commun. ACM, 58, 92, 10.1145/2701413 Davis, 2017, Commonsense reasoning about containers using radically incomplete information, Artif. Intell., 248, 46, 10.1016/j.artint.2017.03.004 Davis Davis, 2017, The first Winograd Schema Challenge at IJCAI-16, AI Mag. Devlin, 2019, BERT: pre-training of deep bidirectional transformers for language understanding Elazar Emami, 2018, A knowledge hunting framework for common sense reasoning Emelin, 2021, Wino-X: multilingual Winograd schemas for commonsense reasoning and coreference resolution, 8517 Fähndrich, 2018, A marker passing approach to Winograd schemas Grishman, 1996 Grosz, 1977, The representation and use of focus in a system for understanding dialogs, 67 Gunning Hansson, 2021, The Swedish Winogender database, 452 He, 2019, A hybrid neural network model for commonsense reasoning He, 2021, WinoLogic: a zero-shot logic-based diagnostic dataset for Winograd schema challenge, 3779 Hobbs, 1979, Coherence and coreference, Cogn. Sci., 3, 67, 10.1207/s15516709cog0301_4 Hobbs, 1993, Interpretation as abduction, Artif. Intell., 63, 69, 10.1016/0004-3702(93)90015-4 Hong Isaak, 2016, Tackling the Winograd Schema Challenge through machine logical inferences Isaak, 2019, Winoflexi: a crowdsourcing platform for the development of Winograd schemas Isaak, 2020, Winventor: a machine-driven approach for the development of Winograd schemas, vol. 2, 26 Kahneman, 2011 Kakwani, 2020, Inlpsuite: monolingual corpora, evaluation benchmarks and pre-trained multilingual language models for Indian languages, 4948 Kameyama, 1986, A property-sharing constraint in centering, 200 A. Kehler, Testing for common sense: Thoughts on pronoun interpretation and the Winograd schema challenge, Talk presented at the Workshop on Language & Common Sense: Integrating Across Psychology, Linguistics, and Computer Science, CogSci-2015, 2015. Kehler, 2008, Coherence and coreference revisited, J. Semant., 25, 1, 10.1093/jos/ffm018 Khashabi Klein, 2019, Attention is (not) all you need for commonsense reasoning Knight, 2016, Tougher Turing test exposes chatbots' stupidity, Technol. Rev. Kocijan, 2019, WikiCREM: a large unsupervised corpus for coreference resolution Kocijan, 2019, A surprisingly robust trick for Winograd schema challenge Kocijan Kocmi, 2020, Gender coreference and bias evaluation at WMT 2020, 357 Lake H.J. Levesque, The Winograd Schema Challenge, AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning, 2011. Levesque, 2014, On our best behaviour, Artif. Intell., 10.1016/j.artint.2014.03.007 Levesque, 2017 Levesque, 2012, The Winograd Schema Challenge Lin Linzen Liu, 2020, Precise task formalization matters in Winograd schema evaluations Liu, 2017, Cause-effect knowledge acquisition and neural association model for solving a set of Winograd Schema Problems Liu, 2017 Liu Lourie, 2021, UNICORN on RAINBOW: a universal commonsense reasoning model on a new multitask benchmark Marcus, 2020, GPT-3, Bloviator: OpenAI's language generator has no idea what it's talking about, Technol. Rev. Markoff, 2015 McDermott, 1976, Artificial intelligence meets natural stupidity, ACM SIGART Bull., 4, 10.1145/1045339.1045340 Melo, 2020, Esquemas de Winograd em português Mikolov, 2013, Distributed Representations of Words and Phrases and Their Compositionality Morgenstern, 2021, Technical perspective: the importance of WINOGRANDE, Commun. ACM, 64, 98, 10.1145/3474378 Morgenstern, 2016, Planning, executing, and evaluating the Winograd Schema Challenge, AI Mag., 37, 50 Nangia Opitz, 2018, Addressing the Winograd Schema Challenge as a sequence ranking task Peng, 2015, Solving hard co-reference problems Poesio Poesio, 2016 Prakash, 2019, Combining knowledge hunting and neural language models to solve the Winograd schema challenge A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, Language models are unsupervised multitask learners, 2019. Rahman, 2012, Resolving complex cases of definite pronouns: the Winograd Schema Challenge Rajpurkar Rohde, 2018, Pronoun interpretation and production Ruan Rudinger, 2018, Gender bias in coreference resolution Sakaguchi, 2020, WINOGRANDE: an adversarial Winograd Schema Challenge at scale Sakaguchi, 2021, Winogrande: an adversarial Winograd schema challenge at scale, Commun. ACM, 64, 99, 10.1145/3474381 Schank, 1977 Sharma Sharma, 2015, Towards addressing the Winograd Schema Challenge – building and using a semantic parser and a knowledge hunting module Shavrina Sidner, 1979 Stanovsky, 2019, Evaluating gender bias in machine translation Storks Thrush, 2022, Probing vision and language models for visio-linguistic compositionality, 5238 Trichelair, 2018, On the evaluation of common-sense reasoning in natural language understanding Trinh Vadász, 2022, Winograd schemata and other datasets for anaphora resolution in Hungarian, Acta Linguist. Acad. Wang Wang, 2019, GLUE: a multi-task benchmark and analysis platform for natural language understanding Wang, 2019, Unsupervised deep structured semantic models for commonsense reasoning Wilks, 1975, An intelligent analyzer and understander of English, Commun. ACM, 18, 264, 10.1145/360762.360770 Winograd, 1972 Wolf, 2004, Discourse coherence and pronoun resolution, Lang. Cogn. Processes, 19, 665, 10.1080/01690960444000034 Xu Yang, 2020, Generative data augmentation for commonsense reasoning Ye Yordanov, 2020, Does the objective matter? Comparing training objectives for pronoun resolution Yordanov Žagar Zhang, 2018, A Distributed Solution for Winograd Schema Challenge, 10.1145/3195106.3195127 Zhang, 2020, A deep diagnosis of essential commonsense knowledge for answering Winograd schema challenge Zhao, 2018, Gender bias in coreference resolution: evaluation and debiasing methods