The design and implementation of an excellent text categorization system

Mingyu Lu1, LiLi Diao1, Yuchang Lu1, Lizhu Zhou1
1Department of Computer Science and Technology, Tsinghua University, Beijing, China

Tóm tắt

Based on the study of text classification techniques, a new text categorization method which uses a weight adjustment measure to improve a vector space model and naive Bayesian classifier is proposed, and an experimental text classification system CWZ is implemented to make comparison within various text classification approaches. Compared with many commercial text classification systems, the behavior of CWZ is much better. We introduce its framework, function, main modules and running environment, give our experimental results, and discuss a few important technical issues involved in the system to get some valuable conclusions. We also describe how to improve the vector space model and naive Bayesian classifier.

Từ khóa

#Text categorization #Bayesian methods #Space technology #Computer science #Electronic mail #Weight measurement #Extraterrestrial measurements #Text mining #Intelligent control #Automation