Arabic vs. English: Comparative Statistical Study
Tóm tắt
Từ khóa
Tài liệu tham khảo
Yang, S.; Zhu, H.; Apostoli, A.; Cao, P.: N-gram statistics in English and Chinese: similarities and differences. In: Proceedings of IEEE International Conference on Semantic Computing, Irvine, pp. 454–460 (2007)
Al-Kadi I.: Study of information-theoretic properties of Arabic based on word entropy and Zipf’s law. J. King Saud Univ. 10, 1–14 (1996)
Attia, M.: Arabic tokenization system. In: Proceedings of the 2007 Workshop on Computational Approaches To Semitic Languages: Common Issues and Resources. Association for Computational Linguistics, Prague, pp. 65–72 (2007)
Heintz, I.: Arabic language modeling with finite state transducers. In: Proceedings of the ACL-08: HLT Student Research Workshop, Companion Volume, Columbus, pp. 37–42 (2008)
Buckwalter, T.: Buckwalter Arabic Morphological Analyzer Version 2.0. Linguistic Data Consortium (LDC) catalogue number LDC2004L02, Philadelphia, USA, ISBN 1-58563-324-0(2004)
Rashwan M., Badrashiny M., Attia M., Abdou S., Rafea A.: A stochastic Arabic diacritizer based on a hybrid of factorized and unfactorized textual features. IEEE Trans. Audio Speech Lang. Process. (TASLP) 19(1), 166–175 (2011)
Shaalan, K.; Abo Bakr, H.; Ziedan, I.: A hybrid approach for building Arabic diacritizer. In: Proceedings of the EACL 2009 Workshop on Computational Approaches to Semitic Languages, Athens, pp. 27–35 (2009)
Kadri, Y.; Nie, J.Y.: Effective stemming for Arabic information retrieval. In: Proceedings of the challenge of Arabic for NLP/MT Conference. The British Computer Society. London (2006)
Majdi, S.; Eric, A.: Comparative evaluation of Arabic language morphological analysers and stemmers. In: Proceedings of COLING 2008 22nd International Conference on Computational Linguistics, Manchester (2008)
Rogati, M.; McCarley, S.; Yang, Y.: Unsupervised learning of Arabic stemming using a parallel corpus. In: Proceedings of the Second Workshop on Analytics for Noisy Unstructured Text Data, Singapore, pp. 113–118 (2003)
Buckwalter, T.: Issues in Arabic orthography and morphology analysis. In: Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages, Geneva (2004)
Graff, D.: Arabic Gigaword Third Edition. Linguistic Data Consortium, Philadelphia (2007)
Graff, D.; Kong, J.; Chen, K.; Maeda, K.: English Gigaword Third Edition. Linguistic Data Consortium, Philadelphia (2007)
Diab, M.; Hacioglu, K.; Jurafsky, D.: Automatic tagging of Arabic text: from raw text to base phrase chunks. 5th Meeting of the North American Chapter of the Association for Computational Linguistics/Human Language Technologies Conference (HLT-NAACL04), Boston (2004)
Habash, N.; Rambow, O.; Roth, R.: MADA+TOKAN: A toolkit for Arabic tokenization, diacritization, morphological disambiguation, POS tagging, stemming and lemmatization. In: Proceedings of the Second International Conference on Arabic Language Resources and Tools, Cairo, pp. 102–109 (2009)
Diab, M.: Second generation AMIRA tools for Arabic processing: fast and robust tokenization, POS tagging, and phrase chunking. In: Proceedings of the Second International Conference on Arabic Language Resources and Tools, Cairo, pp. 285–288 (2009)
Alghoneim K., Alotaiby F.: Syllable based labeling for continuous Arabic speech recognition. J. Appl. Sci. Comput. 10(2), 77–86 (2003)
Manning, C.; Schütze, H.: Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge (1999)
Maamouri, M., Bies, A.; Kulick, S.; Gaddeche, F.; Mekki, W.: Arabic Treebank: Part 3(a) v. 2.6. Linguistic Data Consortium, Philadelphia, Catalog ID: LDC2007E65 (2007)