A comparison of text‐classification techniques applied to Arabic text

Wiley - Tập 60 Số 9 - Trang 1836-1844 - 2009

Ghassan Kanaan¹, Riyad Al‐Shalabi¹, Sameh Ghwanmeh², Hamda Al‐Ma'adeed¹

¹Arab Academy for Banking and Financial Services, Amman, Jordan

²Computer Engineering Department, Yarmouk University, Jordan

Tóm tắt

AbstractMany algorithms have been implemented for the problem of text classification. Most of the work in this area was carried out for English text. Very little research has been carried out on Arabic text. The nature of Arabic text is different than that of English text, and preprocessing of Arabic text is more challenging. This paper presents an implementation of three automatic text‐classification techniques for Arabic text. A corpus of 1445 Arabic text documents belonging to nine categories has been automatically classified using the kNN, Rocchio, and naïve Bayes algorithms. The research results reveal that Naïve Bayes was the best performer, followed by kNN and Rocchio.

Từ khóa

Tài liệu tham khảo

10.1145/584792.584848

Bergo A.(2001). Text categorization and prototypes. Retrieved June 3 2009 fromhttp://www.illc.uva.nl/Publications/ResearchReports/MoL‐2001‐08.text.pdf

10.1145/243199.243278

10.1007/978-3-540-24630-5_69

Ho Y., 1998, In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 81

10.1007/978-1-4471-2099-5_29

10.3115/974358.974395

Joachims T., 1997, Proceedings of the 14th International Conference on Machine Learning (ICML‐97), 143

10.1007/BFb0026683

10.1108/eb026526

10.1109/ICDAR.1999.791887

Koster C.H.A., 2003, Lecture Notes in Computer Science, Vol. 2890: Perspectives of System Informatics, 111

10.1145/564376.564425

Lewis D., 1991, Evaluating text categorization. In Proceedings of the Workshop on Speech and Natural Language, 312

10.1007/BFb0026666

10.1007/978-1-4471-2099-5_1

Lewis D. &Ringuette M.(1994).A comparison of two learning algorithms for text categorization. Paper presented at the Third Annual Symposium on Document Analysis and Information Retrieval Las Vegas NV.

Manning D., 2006, An introduction to information retrieval [Preliminary draft]

McCallum A. &Nigam K.(1998). A comparison of event models for naïve Bayes text classification. In AAAI Workshop on Learning for Text Categorization. Retrieved June 3 2009 fromhttp://www.cs.cmu.edu/∼knigam/papers/multinomial‐aaaiws98.pdf

Mitchell T., 1996, Machine learning

Rocchio J., 1971, The SMART Retrieval System: Experiments in Automatic Document Processing, 313

Salton G, 1983, Introduction to modern information retrieval

10.1108/eb026562

10.1145/290941.290996

10.3115/1219044.1219068

10.1145/215206.215365

Sebastiani F.(1999).A tutorial on automated text categoriation. Paper presented at the European Symposium on Telematics Hypermedia and Artificial Intelligence (THAI‐99) Varese Italy.

10.1145/505282.505283

10.2495/978-1-85312-995-7/04

Shankar S. &Karypis G.(2000). Weight adjustment schemes for a centroid‐based classifier. Retrived June 3 2009 from the University of Minnesota Web site:http://glaros.dtc.umn.edu/gkhome/node/160

Tokunaga T. &Iwayama M.(1994). Text categorization based on weighted inverse document frequency. Retrieved June 3 2009 from the Department of Computer Science Tokyo Institute of Technology Web site:http://tanaka‐www.cs.titech.ac.jp/publication/archive/142.pdf

10.1023/A:1009982220290

10.1145/312624.312647

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Công cụ kiểm tra chính tả và thể thức Viver

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA