Web mining in soft computing framework: relevance, state of the art and future directions
Tóm tắt
The paper summarizes the different characteristics of Web data, the basic components of Web mining and its different types, and the current state of the art. The reason for considering Web mining, a separate field from data mining, is explained. The limitations of some of the existing Web mining methods and tools are enunciated, and the significance of soft computing (comprising fuzzy logic (FL), artificial neural networks (ANNs), genetic algorithms (GAs), and rough sets (RSs) are highlighted. A survey of the existing literature on "soft Web mining" is provided along with the commercially available systems. The prospective areas of Web mining where the application of soft computing needs immediate attention are outlined with justification. Scope for future research in developing "soft Web mining" systems is explained. An extensive bibliography is also provided.
Từ khóa
#Web mining #Fuzzy logic #Artificial intelligence #Data mining #Artificial neural networks #Genetic algorithms #Computer networks #Rough sets #Information retrieval #Search enginesTài liệu tham khảo
singh, 1998, wrapper induction for semistructured web based information sources, Proc 2nd Int Conf KDD Data Mining, 329
10.1023/A:1010022931168
10.1016/S0004-3702(00)00013-8
10.1109/TAI.1997.632303
10.1016/S0169-7552(98)00110-X
1999, IEEE Computer (Special Issue on Digital Libraries), 32
mitchell, 1997, webwatcher: a tour guide for the world wide web, Proc Int Joint Conf AIIJCA197, 770
furnkranz, 1999, exploiting structural information for text classification on the www, Proc Advances Intell Data Anal 3rd Int Symp IDA99, 487
10.1109/5254.784085
10.1016/S0167-739X(97)00022-8
10.1145/360402.360414
ghani, 2000, data mining on symbolic knowledge extracted from the web, Proc 6th Int Conf Knowledge Discovery Data Mining (KDD-2000) Workshop on Text Mining, 29
10.1145/846183.846187
etzioni, 1997, ahoy! the homepage finder, Proc 6th WWW Conf
10.1145/360402.360421
10.1145/133160.133214
mobasher, 1997, Web Mining Patterns from WWW Transactions
10.1016/S0169-7552(97)00021-4
mladenic, 1998, efficient text categorization, Proc Text Mining Workshop 10th European Conf Machine Learning ECML98
cohen, 1995, what can we learn from the web?, Proc 16th Int Conf Machine Learning (ICML99), 515
pal, 1999, Neuro-Fuzzy Pattern Recognition Methods in Soft Computing
10.1109/72.977258
gyenesei, 2000, A Fuzzy Approach for Mining Quantitative Association Rules
mobasher, 1997, Clustering in a High Dimensional Space Using Hypergraph Models
joshi, 1998, robust fuzzy clustering methods to support web mining, Proc Workshop in Data Mining and Knowledge Discovery SIGMOD, 15-1
10.1109/FUZZY.1999.790086
pasi, 2000, application of fuzzy set theory to extend boolean information retrieval, Soft Computing in Information Retrieval Techniques and Applications, 50, 21, 10.1007/978-3-7908-1849-9_2
gedeon, 2000, a model of intelligent information retrieval using fuzzy tolerance relations based on hierarchical co-occurrence of words, Soft Computing in Information Retrieval Techniques and Applications, 50, 48, 10.1007/978-3-7908-1849-9_3
yager, 2000, a framework for linguistic and hierarchical queries for document retrieval, Soft Computing in Information Retrieval Techniques and Applications, 50, 3, 10.1007/978-3-7908-1849-9_1
zadeh, 2001, a new direction in ai: toward a computational theory of perceptions, AI Mag, 22, 73
soderland, 1999, learning information extraction rules for semistructured and free text, Machine Learning (Special Issue Natural Language Learning), 34, 233
10.1145/296854.277639
etzioni, 1996, moving up the information food cahin: deploying softbots on the web, Proc 14th Nat Conf AI, 1322
10.1016/S0169-7552(97)00031-7
10.1016/S0169-7552(97)00010-X
baeza-yates, 1999, Modern Information Retrieval
pal, 2000, Soft Computing for Image Processing, 10.1007/978-3-7908-1858-1
10.1016/0169-7552(96)00024-4
10.1145/358923.358934
10.1108/eb005334
mobasher, 2000, discovery of aggregate usage profiles for web personalization, Proc KDD-2000 Workshop Web Mining E-Commerce
10.1145/175247.175255
etzioni, 1998, web document clustering: a feasibility demonstration, Proc 21st Annu Int ACM SIGIR Conf, 46
10.1016/S1389-1286(99)00052-3
10.1145/276627.276652
pazzani, 1998, learning collaborative information filters, Proc 15th Int Conf Machine Learning, 46
lin, 2000, collaborative recommendation via adaptive association rule mining, Int Workshop Web Mining for E-Commerce (WEBKDD 00)
pazzani, 1996, syskill and webert: identifying interesting web sites, Proc 13th Nat Conf AI, 54
kohonen, 1997, Self-Organizing Maps, 10.1007/978-3-642-97966-8
10.1109/72.846729
freitag, 2000, boosted wrapper induction, Proc AAAI, 577
crestani, 2000, Soft Computing in Information Retrieval Techniques and Application, 50, 10.1007/978-3-7908-1849-9
kim, 2000, web document retrieval by genetic learning of importance factors for html tags, Proc Int Workshop Text Web Mining, 13
drummond, 1995, A Learning Agent That Assists the Browsing of Software Libraries
10.1109/72.363450
10.1109/CEC.1999.782599
10.1007/978-3-7908-1849-9_8
10.1109/NAFIPS.1999.781751
shavlik, 1994, knowledge-based artificial neural networks, Artificial Intelligence, 70, 119, 10.1016/0004-3702(94)90105-8
shavlik, 2001, a system for building intelligent agents that learn to retrieve and extract information, Int J User Modeling User Adapted Interaction (Special Issue on User Modeling and Intelligent Agents)
10.1007/978-3-7908-1849-9_6
boughanem, 1998, mercure at trec7, Proc 7th Int Conf Text Retrieval TREC7, —355
10.1007/978-3-7908-1849-9_4
merkl, 2000, document classification with unsupervised artificial neural networks, Soft Computing in Information Retrieval Techniques and Applications, 50, 102, 10.1007/978-3-7908-1849-9_5
10.1109/SBRN.2000.889727
freitag, 1999, information extraction from hmm's and shrinkage, Proc AAAI-99 Workshop Machine Learning Inform Extraction
10.1145/360402.360406
bikel, 1999, an algorithm that learns what's in a name, Machine Learning (Special Issue Natural Language Learning), 34, 211
10.1145/240455.240473
10.1016/S0888-613X(96)00072-2
maheswari, 2001, the variable precision rough set model for web usage mining, Proc 1st Asia-Pacific Conf Web Intell (WI-2001
10.1007/3-540-45372-5_51
10.1007/978-3-7908-1849-9_14
pal, 1999, Rough Fuzzy Hybridization A New Trend in Decision Making
wong, 2000, granular information retrieval, Soft Computing in Information Retrieval Techniques and Applications, 50, 317, 10.1007/978-3-7908-1849-9_13
lee, 2001, developing an adaptive search engine for e-commerce using a web mining approach, Proc Int Conf Information Technology Coding and Computing, 604
10.1007/978-1-4471-0687-6
wan, 2001, content-based sound retrieval for web application, Web Intelligence Research and Development, lncs 2198, 389, 10.1007/3-540-45490-X_49
10.1007/3-540-45490-X_38
10.1109/5254.757626
freitag, 1998, information extraction from html: application of a general machine learning approach, Proc 15th Conf Artificial Intell AAAAI-98, 517
brown, 1994, the harvest information discovery and access system, Proc 2nd Int WWW Conf Distributed Environments, 763
10.1109/CAIA.1995.378787
levy, 1995, the information manifold, AAAI Spring Symposium on Information Gathering From Heterogeneous Distributed Environments
kwok, 1996, planning to gather information, Proc 14th Nat Conf AI
10.1016/S0169-7552(97)00033-0
10.1145/63039.63044
etzioni, 1996, A Scalable Comparison Shopping Agent for the World Wide Web
10.1109/ICEC.1994.349905
etzioni, 1995, category translation: learning to understand information on the internet, Proc 15th Int Joint Conf Artificial Intell, 930
10.1109/ICEC.1996.542674
craven, 1998, learning to extract symbolic knowledge from the world wide web, Proc 15th Nat Conf AI (AAAI98), 509
loia, 2001, an evolutionary approach to automatic web page categorization and updating, Web Intelligence Research and Development, lncs 2198, 292, 10.1007/3-540-45490-X_35
yang, 1992, Query Modification Using Genetic Algorithms in Vector Space Models
10.1109/72.728363
kargupta, 1999, collective data mining: a new perspective toward distributed data mining, Advances in Distributed and Parallel Knowledge Discovery
etzioni, 1997, adaptive web sites: an ai challenge, Proc 15th Int Joint Conf Artificial Intell (IJCAI 97), 16
skowron, 1998, Rough Sets in Knowledge Discovery
10.1109/TKDE.2003.1161579