Arabic vs. English: Comparative Statistical Study

Arabian Journal for Science and Engineering - Tập 39 Số 2 - Trang 809-820 - 2014
Fahad Alotaiby1, Salah G. Foda1, Ibrahim A. Al-Kharashi2
1Department of Electrical Engineering, College of Engineering, King Saud University, Riyadh, Saudi Arabia
2Computer Research Institute, King Abdulaziz City for Science and Technology, Riyadh, Saudi Arabia

Tóm tắt

Từ khóa


Tài liệu tham khảo

Yang, S.; Zhu, H.; Apostoli, A.; Cao, P.: N-gram statistics in English and Chinese: similarities and differences. In: Proceedings of IEEE International Conference on Semantic Computing, Irvine, pp. 454–460 (2007)

Al-Kadi I.: Study of information-theoretic properties of Arabic based on word entropy and Zipf’s law. J. King Saud Univ. 10, 1–14 (1996)

Attia, M.: Arabic tokenization system. In: Proceedings of the 2007 Workshop on Computational Approaches To Semitic Languages: Common Issues and Resources. Association for Computational Linguistics, Prague, pp. 65–72 (2007)

Heintz, I.: Arabic language modeling with finite state transducers. In: Proceedings of the ACL-08: HLT Student Research Workshop, Companion Volume, Columbus, pp. 37–42 (2008)

Buckwalter, T.: Buckwalter Arabic Morphological Analyzer Version 2.0. Linguistic Data Consortium (LDC) catalogue number LDC2004L02, Philadelphia, USA, ISBN 1-58563-324-0(2004)

Rashwan M., Badrashiny M., Attia M., Abdou S., Rafea A.: A stochastic Arabic diacritizer based on a hybrid of factorized and unfactorized textual features. IEEE Trans. Audio Speech Lang. Process. (TASLP) 19(1), 166–175 (2011)

Shaalan, K.; Abo Bakr, H.; Ziedan, I.: A hybrid approach for building Arabic diacritizer. In: Proceedings of the EACL 2009 Workshop on Computational Approaches to Semitic Languages, Athens, pp. 27–35 (2009)

Kadri, Y.; Nie, J.Y.: Effective stemming for Arabic information retrieval. In: Proceedings of the challenge of Arabic for NLP/MT Conference. The British Computer Society. London (2006)

Majdi, S.; Eric, A.: Comparative evaluation of Arabic language morphological analysers and stemmers. In: Proceedings of COLING 2008 22nd International Conference on Computational Linguistics, Manchester (2008)

Rogati, M.; McCarley, S.; Yang, Y.: Unsupervised learning of Arabic stemming using a parallel corpus. In: Proceedings of the Second Workshop on Analytics for Noisy Unstructured Text Data, Singapore, pp. 113–118 (2003)

Buckwalter, T.: Issues in Arabic orthography and morphology analysis. In: Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages, Geneva (2004)

Graff, D.: Arabic Gigaword Third Edition. Linguistic Data Consortium, Philadelphia (2007)

Graff, D.; Kong, J.; Chen, K.; Maeda, K.: English Gigaword Third Edition. Linguistic Data Consortium, Philadelphia (2007)

Diab, M.; Hacioglu, K.; Jurafsky, D.: Automatic tagging of Arabic text: from raw text to base phrase chunks. 5th Meeting of the North American Chapter of the Association for Computational Linguistics/Human Language Technologies Conference (HLT-NAACL04), Boston (2004)

Habash, N.; Rambow, O.; Roth, R.: MADA+TOKAN: A toolkit for Arabic tokenization, diacritization, morphological disambiguation, POS tagging, stemming and lemmatization. In: Proceedings of the Second International Conference on Arabic Language Resources and Tools, Cairo, pp. 102–109 (2009)

Diab, M.: Second generation AMIRA tools for Arabic processing: fast and robust tokenization, POS tagging, and phrase chunking. In: Proceedings of the Second International Conference on Arabic Language Resources and Tools, Cairo, pp. 285–288 (2009)

Alghoneim K., Alotaiby F.: Syllable based labeling for continuous Arabic speech recognition. J. Appl. Sci. Comput. 10(2), 77–86 (2003)

Manning, C.; Schütze, H.: Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge (1999)

Maamouri, M., Bies, A.; Kulick, S.; Gaddeche, F.; Mekki, W.: Arabic Treebank: Part 3(a) v. 2.6. Linguistic Data Consortium, Philadelphia, Catalog ID: LDC2007E65 (2007)