An Arabic CCG approach for determining constituent types from Arabic Treebank

Ahmed I. El-taher1, Hitahm M. Abo Bakr1, Ibrahim Zidan1, Khaled Shaalan2
1Derpartment of Computer and System Engineering, Faculty of Engineering, Zagazig University, Zagazig, Asharkia, Egypt
2The British University, Dubai, United Arab Emirates

Tóm tắt

Từ khóa


Tài liệu tham khảo

Abdel Monem, 2008, Generating Arabic text in multilingual speech-to-speech machine translation framework, Mach. Transl., 20, 205, 10.1007/s10590-009-9054-9

Abo Bakr, Hitham, Shaalan, Khaled, Ziedan, Ibrahim, 2008. A hybrid approach for converting written Egyptian colloquial dialect into diacritized Arabic. In: Proceedings of INFOS2008, the special track on Natural Language Processing, 27–29 March, Cairo, Egypt.

Bies, Ann, Ferguson, Mark, Katz, Karen, MacIntyre, Robert, 1995. Bracketing Guidelines for Treebank II StylePenn Treebank Project. Technical Report, LDC.

Bikel, Daniel M., 2002. Design of a multi-lingual, parallel-processing statistical parsing engine. In: Proceedings of HLT2002, San Diego, CA.

Bikel, 2004, Intricacies of Collins’ parsing model, Comput. Ling., 30, 479, 10.1162/0891201042544929

Birch, Alexandra, Osborne, Miles, Koehn, Philipp, 2007. CCG Super tags in Factored Statistical Machine Translation. In: Proceedings of ACL.

Bos, Johan, Bosco, Cristina, Mazzei, Alessandro, 2009. Converting a dependency Treebank to a categorial grammar Treebank for Italian. In: Proceedings of TLT 8, Milano, Italy.

Boxwell, Stephen A., Brew, Chris, 2010. A pilot Arabic CCGbank. In: Proceedings of LREC-10, Valleta, Malta.

Çakıcı, Ruken, 2005. Automatic induction of a CCG grammar for Turkish. In: Proceedings of ACL Student Research Workshop. pp. 73–78.

Clark, 2007, Wide-coverage efficient statistical parsing with CCG and log-linear models, Comput. Ling., 33, 10.1162/coli.2007.33.4.493

Collins, Michael, 1999. Head-Driven Statistical Models for Natural Language Parsing (Ph.D. thesis). Computer and Information Science, University of Pennsylvania.

Curran, James R., Clark, Stephen, Bos, Johan, 2007. Linguistically motivated large-scale NLP with C&C and boxer. In: Proceedings of ACL demo. pp. 33–36.

Diab, Mona T., 2009. Second generation AMIRA tools for Arabic processing: fast and robust tokenization, POS tagging, and base phrase chunking. In: Proceedings of 2nd International Conference on Arabic Language Resources and Tools.

Habash, Nizar, Faraj, Reem, Roth, Ryan, 2009. Syntactic annotation in the Columbia Arabic Treebank. In: Proceedings of MEDAR, Cairo, Egypt.

Habash, Nizar, Rambow, Owen, Roth, Ryan, 2009. MADA+TOKAN: a toolkit for Arabic tokenization, diacritization, morphological disambiguation, POS tagging, stemming and lemmatization. In: Proceedings of MEDAR, Cairo, Egypt.

Hassan, Hany, 2009. Lexical Syntax for Statistical Machine Translation (Ph.D. thesis). Dublin City University.

Hockenmaier, Julia, 2006. Creating a CCGbank and a wide coverage CCG lexicon for German. In: Proceedings of the ACL, vol. 44. p. 505.

Hockenmaier, Julia, Steedman, Mark, 2005. CCGbank: User’s Manual. Technical Report MS-CIS-05-09. Department of Computer and Information Science, University of Pennsylvania.

Hockenmaier, 2007, CCGbank: a corpus of CCG derivations and dependency structures extracted from the Penn Treebank, Comput. Ling., 33, 355, 10.1162/coli.2007.33.3.355

Koehn, Philipp, Hoang, Hieu, 2007. Factored translation models. In: Proceedings of EMNLP, Prague, Czech Republic.

Koehn, Philipp, Hoang, Hieu, Birch, Alexandra, Callison-Burch, Chris, Federico, Marcello, Bertoldi, Nicola, Cowan, Brooke, Shen, Wade, Moran, Christine, Zens, Richard, Dyer, Chris, Bojar, Ondrej, Constantin, Alexandra, Herbst, Evan, 2007. Moses: open source toolkit for statistical machine translation. In: Proceedings of ACL, Demonstration Session, Prague, Czech Republic.

Kulick, Seth, Gabbard, Ryan, Marcus, Mitchell, 2006. Parsing the Arabic Treebank: analysis and improvements. In: Proceedings of TLT 6, Prague, Czech Republic.

Maamouri, Mohamed, Bies, Ann, Buckwalter, Tim, Mekki, Wigdan, 2004a. The Penn Arabic Treebank: building a large-scale annotated Arabic corpus. In: Proceedings of NEMLAR. pp. 102–109.

Maamouri, Mohamed, Bies, Ann, Kulick, Seth, 2008. Enhancing the Arabic treebank: a collaborative effort toward new annotation guidelines. In: Proceedings of LREC’08, Marrakech, Morocco.

Magerman, David M., 1994. Natural Language Parsing as Statistical Pattern Recognition (Ph.D. thesis). Department of Computer Science, Stanford University.

Othman, Eman, Shaalan, Khaled, Rafea, Ahmed, 2004. Towards resolving ambiguity in understanding Arabic sentence. In: Proceedings of the International Conference on Arabic Language Resources and Tools, NEMLAR, 22nd–23rd Sept., 2004, Egypt. pp. 118–122.

Sandillon-Rezer, 2011, Using tree transducers for grammatical inference. LACL 2011, LNAI, 6736, 235

Sang, Erik Tjong Kim, Buchholz, Sabine, 2000. Introduction to the CoNLL-2000 shared task: Chunking. In: Proceedings of the CoNLL, pp. 127–132.

Shaalan, 2014, A survey of Arabic named entity recognition and classification, Comput. Ling., 40, 2, 10.1162/COLI_a_00178

Shaalan, Khaled, Abo Bakr, Hitham, Ziedan, Ibrahim, 2009. A hybrid approach for building Arabic diacritizer. In: Proceedings of EACL 2009, Workshop on Computational Approaches to Semitic Languages, Association for Computational Linguistics, Athens, Greece, 31 March, 2009. pp. 27–35.

Smrž, Otakar, Bielický, Viktor, Kouřilová, Iveta, Kráčmar, Jakub, Hajič, Jan, Zemánek, Petr, 2008. Prague Arabic dependency treebank: a word on the million words. In Proceedings of LREC 2008, Marrakech, Morocco. pp. 16–23

Steedman, 1996

Steedman, 2000

Tse Daniel, Curran, James R., 2010. Chinese CCGbank: extracting CCG derivations from the Penn Chinese Treebank. In: Proceedings of Coling 2010. pp. 1083–1091.

Zitouni, 2014, Natural language processing of Semitic languages, 10.1007/978-3-642-45358-8