Extracting the roots of Arabic words without removing affixes
Tóm tắt
Most research in Arabic roots extraction focuses on removing affixes from Arabic words. This process adds processing overhead and may remove non-affix letters, which leads to the extraction of incorrect roots. This paper advises a new approach to dealing with this issue by introducing a new algorithm for extracting Arabic words’ roots. The proposed algorithm, which is called the Word Substring Stemming Algorithm, does not remove affixes during the extraction process. Rather, it is based on producing the set of all substrings of an Arabic word, and uses the Arabic roots file, the Arabic patterns file and a concrete set of rules to extract correct roots from substrings. The experiments have shown that the proposed approach is competitive and its accuracy is 83.9%, Furthermore, its accuracy can be enhanced more in the sense that, for about 9.9% of the tested words, the WSS algorithm retrieves two candidates (in most cases) for the correct root.
Từ khóa
Tài liệu tham khảo
Duwairi R, 2007, The International Arab Journal of Information Technology, 4, 125
Chowdhury A, 2002, Linear combinations based on document structure and varied stemming for Arabic retrieval
Khoja S, Garside R. Stemming Arabic text, http://zeus.cs.pacificu.edu/shereen/research.htm. (2008, accessed 1 September 2013).
Beesley K, 1998, The 6th international conference and exhibition on multilingual computing
Al-Fedaghi S, 1989, The 11th national computer conference and exhibition
Mayfield J, 2001, TREC 2001
Harmanani H, 2006, The International Arab Journal of Information Technology, 3, 265
Chen A, 2002, TREC 2002
Kadri Y, 2006, The challenge of Arabic for NLP/MT conference
Boudlal A, 2011, International Arab Journal of Information Technology, 8, 91
Al-Ameed H. A proposed new model using a light stemmer for increasing the success of search in Arabic terms. PhD Thesis, University of Bradford, Bradford, 2006.
Hmeidi I, 2010, Journal of the American Society for Information Science and Technology, 61, 583, 10.1002/asi.21247
Al-Kabi M, 2006, The international Arab conference on information technology
Al-Sarhan H, 2003, The 2003 Arab conference on information technology
Ghawanmeh S, 2005, The 5th international conference of the Business Information Management Association