A proposed system for segmentation of information sources in portals and search engines repositories

I. Anagnostopoulos1, C. Anagnostopoulos1, I. Papaleonidopoulos1, V. Loumos1, E. Kayafas1
1Department of Electrical & Computer Engineering, National Technical University of Athens (NTUA), Athens, Greece

Tóm tắt

Nowadays, there is a huge volume of information on the Web, which is disseminated to users in a chaotic way. In order to be easily accessed, the information must be clustered and classified in appropriate knowledge areas. Thus, many heavily visited sites or portals attempt to unify the access to multiple information sources, providing by this way classification of information. The paper proposes a system, aiming to classify e-commerce sites according their Web content. This system can be implemented for automatic knowledge segmentation in a portal or in a search engine repository. The system performance reached 96% in the first test sets, after the learning phase. However, the performance significantly increases (up to 98%) as the number of test sets increases.

Từ khóa

#Portals #Search engines #System testing #Business #Feedback #Information filtering #Information retrieval #Chaos #System performance #Information systems

Tài liệu tham khảo

10.1002/(SICI)1097-4571(198609)37:5<279::AID-ASI1>3.0.CO;2-Q salton, 0, Automatic text processing, Addison-Wesley Publishing Company Inc, 1989, 313 saltan, 1975, A Vector Space Model for Automatic Indexing Communications of the ACM, 18, 613 borra, 2001, Classification and Data Analysis, 10.1007/978-3-642-59471-7 10.1080/10196789700000038 10.1109/HICSS.1998.655275 klose, 1999, Design of business media - An integrated model of electronic commerce, Proceedings of the Fifth Americas Conference on Information Systems (AMCIS'99), 115 salton, 0, Automatic Text Processing, 1989, 301 salton, 1989, Automatic Text Processing, 231 baeza-yates, 1999, Modem Information Retrieval rocchio jr, 1971, Relevance Feedback in Information RetrievaLin the Smart Retrieval System - Experiments in Automatic Document Processing, 324 moens, 2000, Automatic Indexing and Abstracting of Document Texts