gDelta: a missing link in the grammar engineering toolchain

Springer Science and Business Media LLC - Tập 49 - Trang 51-75 - 2015

Ned Letcher¹, Rebecca Dridan², Timothy Baldwin¹

¹Department of Computing and Information Systems, The University of Melbourne, Melbourne, Australia

²Department of Informatics, University of Oslo, Oslo, Norway

Tóm tắt

The development of precision grammars is an inherently resource-intensive process; their complexity means that changes made to one area of a grammar often introduce unexpected flow-on effects elsewhere in the grammar which may only be discovered after some time has been invested in updating numerous test suite items. In this paper, we present the browser-based gDelta tool, which aims to provide grammar engineers with more immediate feedback on the impact of changes made to a grammar by comparing parser output from two different grammar versions. We describe an attribute weighting algorithm for highlighting components of the grammar that have been strongly impacted by a modification to the grammar, as well as a technique for clustering test suite items whose parsability has changed, in order to locate related groups of effects. These two techniques are used to present the grammar engineer with different views on the grammar to inform them of different aspects of change in a data-driven manner.

Tài liệu tham khảo

Baldridge, J., Chatterjee, S., Palmer, A., & Wing, B. (2007). DotCCG and VisCCG: Wiki and programming paradigms for improved grammar engineering with OpenCCG. In Proceedings of the workshop on grammar engineering across frameworks (GEAF 2007). Manchester, UK. Bender, E. M., Flickinger, D., Oepen, S., & Zhang, Y. (2011). Parser evaluation over local and non-local deep dependencies in a large corpus. In Proceedings of the 2011 conference on empirical methods in natural language processing (pp. 397–408). Edinburgh, UK. Butt, M., King, T. H., Niño, M. E., & Segond, F. (1999). A grammar writer’s cookbook. Stanford, USA: CSLI Publications. Butt, M., Dyvik, H., King, T. H., Masuichi, H., & Rohrer, C. (2002). The parallel grammar project. In Proceedings of the workshop on grammar engineering and evaluation at the 19th international conference on computational linguistics (COLING 2002) (pp. 1–7). Taipei, Taiwan. Callmeier, U. (2002). PET—A platform for experimentation with efficient HPSG processing techniques. In S. Oepen, D. Flickinger, J. Tsujii, & H. Uszkoreit (Eds.), Collaborative language engineering. Stanford, USA: CSLI Publications. Crabbé, B., Duchier, D., Gardent, C., Le Roux, J., & Parmentier, Y. (2013). XMG: eXtensible MetaGrammar. Computational Linguistics, 39(3), 591–629. Crouch, D., Dalrymple, M., Kaplan, R. M., King, T. H., Maxwell, J., & Newman, P. (2014). XLE documentation. Palo Alto Research Center. http://www2.parc.com/isl/groups/nltt/xle/doc/xle_toc.html Dalrymple, M. (2001). Lexical functional grammar. New York, USA: Academic Press. de Kok, D., Ma, J., & van Noord, G. (2009). A generalized method for iterative error mining in parsing results. In Proceedings of the 2009 workshop on grammar engineering across frameworks (GEAF 2009) (pp. 71–79). Suntec, Singapore. Dost, A., & King, T. H. (2009). Using large-scale parser output to guide grammar development. In Proceedings of the 2009 workshop on grammar engineering across frameworks (GEAF 2009) (pp. 63–70). Suntec, Singapore. Flickinger, D. (2002). On building a more efficient grammar by exploiting types. In S. Oepen, D. Flickinger, J. Tsujii, & H. Uszkoreit (Eds.), Collaborative language engineering (pp. 1–17). Stanford: CSLI Publications. Gardent, C., & Narayan, S. (2012). Error mining on dependency trees. In Proceedings of the 50th annual meeting of the association for computational linguistics (pp. 592–600). Jeju Island, Korea. Goodman, M., & Bond, F. (2009). Using generation for grammar analysis and error detection. Proceedings of the ACL-IJCNLP 2009 conference short papers (pp. 109–112). Suntec, Singapore. Guillaume, B., & Perrier, G. (2009). Interaction grammars. Research on Language and Computation, 7(2–4), 171–208. Joshi, A. K., & Schabes, Y. (1997). Tree-adjoining grammars. In G. Rozenberg & A. Salomaa (Eds.), Handbook of formal languages (Vol. 3, pp. 69–124). Berlin, Germany: Springer. Müller, S. (2013). The CoreGram project: A brief overview and motivation. In Proceedings of the workshop on high-level methodologies for grammar engineering (HMGE 2013) (pp. 93–104). Düsseldorf, Germany. Oepen, S., & Carroll, J. (2000). Parser engineering and performance profiling. Natural Language Engineering, 6(1), 81–97. Oepen, S., & Flickinger, D. P. (1998). Towards systematic grammar profiling test suite technology ten years after. Natural Language Engineering, Special Issue on Evaluation, 12, 411–436. Oepen, S., Netter, K., & Klein, J. (1997). TSNLP—Test suites for natural language processing. In J. Nerbonne (Ed.), Linguistic databases (pp. 13–36). Stanford, USA: CSLI Publications. Oepen, S., Toutanova, K., Shieber, S., Manning, C., Flickinger, D., & Brants, T. (2002). The LinGO Redwoods treebank. Motivation and preliminary applications. In Proceedings of the 19th international conference on computational linguistics (COLING 2002) (pp 1–5). Taipei, Taiwan. Pollard, C., & Sag, I. A. (1994). Head-driven phrase structure grammar. Chicago, USA: University of Chicago Press. Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53–65. Sagot, B., & de La Clergerie, E. (2006). Error mining in parsing results. In Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics (pp. 329–336). Sydney, Australia. Siegel, M. (2000). HPSG analysis of Japanese. In W. Wahlster (Ed.), Verbmobil: Foundations of speech-to-speech translation (pp. 264–279). Berlin, Germany: Springer. Steedman, M. (2000). The syntactic process. Cambridge, USA: MIT Press. Tanaka, Y. (2001). Compilation of a multilingual parallel corpus. In Proceedings of PACLING 2001 (pp. 265–268). Kitakyushu, Japan. van Noord, G. (2004). Error mining for wide-coverage grammar engineering. In Proceedings of the 42nd meeting of the association for computational linguistics (ACL 2004) (pp. 446–453). Barcelona, Spain. Waterman, S. A. (2009). Distributed parse mining. In Proceedings of the workshop on software engineering, testing, and quality assurance for natural language processing (SETQA-NLP 2009) (pp. 56–64). Boulder, USA. Ytrestøl, G., Flickinger, D., & Oepen, S. (2009). Extracting and annotating Wikipedia sub-domains—Towards a new escience community resource. In Proceedings of the seventh international workshop on treebanks and linguistic theory (pp. 185–197). Groningen, Netherlands. Zhang, Y., Oepen, S., & Carroll, J. (2007). Efficiency in unification-based \(n\)-best parsing. In Proceedings of the 10th international conference on parsing technologies (pp. 48–59). Prague, Czech Republic.

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Công cụ kiểm tra chính tả và thể thức Viver

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA