Phần mềm khai thác dữ liệu WEKA
Tóm tắt
Đã hơn mười hai năm trôi qua kể từ khi WEKA được phát hành công khai lần đầu tiên. Trong thời gian đó, phần mềm đã được viết lại hoàn toàn từ đầu, phát triển mạnh mẽ và hiện nay đi kèm với một tài liệu về khai thác dữ liệu [35]. Hiện tại, WEKA được chấp nhận rộng rãi trong cả lĩnh vực học thuật và kinh doanh, có một cộng đồng năng động, và đã được tải xuống hơn 1.4 triệu lần kể từ khi được đưa lên Source-Forge vào tháng 4 năm 2000. Bài báo này cung cấp một cái nhìn tổng quan về WEKA workbench, xem xét lịch sử của dự án, và, dựa trên phiên bản ổn định 3.6 gần đây, tóm tắt những gì đã được bổ sung kể từ phiên bản ổn định cuối cùng (Weka 3.4) được phát hành vào năm 2003.
Từ khóa
Tài liệu tham khảo
K. Bennett and M. Embrechts . An optimization perspective on kernel partial least squares regression . In J.S. et al. , editor, Advances in Learning Theory : Methods, Models and Applications, volume 190 of NATO Science Series , Series III: Computer and System Sciences, pages 227 -- 249 . IOS Press , Amsterdam, The Netherlands, 2003 . K. Bennett and M. Embrechts. An optimization perspective on kernel partial least squares regression. In J.S. et al., editor, Advances in Learning Theory: Methods, Models and Applications, volume 190 of NATO Science Series, Series III: Computer and System Sciences, pages 227--249. IOS Press, Amsterdam, The Netherlands, 2003.
L. Breiman , J.H. Friedman , R.A. Olshen , and C.J. Stone . Classification and Regression Trees . Wadsworth International Group , Belmont, California , 1984 . L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone. Classification and Regression Trees. Wadsworth International Group, Belmont, California, 1984.
S. Celis and D.R. Musicant . Weka-parallel: machine learning in parallel. Technical report , Carleton College , CS TR , 2002 . S. Celis and D.R. Musicant. Weka-parallel: machine learning in parallel. Technical report, Carleton College, CS TR, 2002.
C.-C. Chang and C.-J. Lin . LIBSVM: a library for support vector machines , 2001 . Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
A. Genkin , D.D. Lewis , and D. Madigan . Largescale bayesian logistic regression for text categorization. Technical report , DIMACS , 2004 . A. Genkin, D.D. Lewis, and D. Madigan. Largescale bayesian logistic regression for text categorization. Technical report, DIMACS, 2004.
M. Hall and E. Frank . Combining naive Bayes and decision tables . In Proc 21st Florida Artificial Intelligence Research Society Conference , Miami, Florida. AAAI Press , 2008 . M. Hall and E. Frank. Combining naive Bayes and decision tables. In Proc 21st Florida Artificial Intelligence Research Society Conference, Miami, Florida. AAAI Press, 2008.
K. Hornik , A. Zeileis , T. Hothorn , and C. Buchta . RWeka: An R Interface to Weka , 2009 . R package version 0.3-16. K. Hornik, A. Zeileis, T. Hothorn, and C. Buchta. RWeka: An R Interface to Weka, 2009. R package version 0.3-16.
L. Jiang and H. Zhang . Weightily averaged onedependence estimators . In Proceedings of the 9th Biennial Pacific Rim International Conference on Artificial Intelligence, PRICAI 2006 , volume 4099 of LNAI, pages 970 -- 974 , 2006 . L. Jiang and H. Zhang. Weightily averaged onedependence estimators. In Proceedings of the 9th Biennial Pacific Rim International Conference on Artificial Intelligence, PRICAI 2006, volume 4099 of LNAI, pages 970--974, 2006.
R. Khoussainov , X. Zuo , and N. Kushmerick . Gridenabled Weka: A toolkit for machine learning on the grid . ERCIM News , 59 , 2004 . R. Khoussainov, X. Zuo, and N. Kushmerick. Gridenabled Weka: A toolkit for machine learning on the grid. ERCIM News, 59, 2004.
M.-A. Krogel and S. Wrobel . Facets of aggregation approaches to propositionalization . In T. Horvath and A. Yamamoto, editors, Work-in-Progress Track at the Thirteenth International Conference on Inductive Logic Programming (ILP) , 2003 . M.-A. Krogel and S. Wrobel. Facets of aggregation approaches to propositionalization. In T. Horvath and A. Yamamoto, editors, Work-in-Progress Track at the Thirteenth International Conference on Inductive Logic Programming (ILP), 2003.
D. Nadeau . Balie-baseline information extraction : Multilingual information extraction from text with machine learning and natural language techniques. Technical report , University of Ottawa , 2005 . D. Nadeau. Balie-baseline information extraction : Multilingual information extraction from text with machine learning and natural language techniques. Technical report, University of Ottawa, 2005.
G. Piatetsky-Shapiro. KDnuggets news on SIGKDD service award. http://www.kdnuggets.com/news/2005/n13/2i.html 2005. G. Piatetsky-Shapiro. KDnuggets news on SIGKDD service award. http://www.kdnuggets.com/news/2005/n13/2i.html 2005.
R Development Core Team . R: A Language and Environment for Statistical Computing . R Foundation for Statistical Computing, Vienna, Austria , 2006 . ISBN 3-900051-07-0. R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2006. ISBN 3-900051-07-0.
K. Sandberg. The haar wavelet transform. http://amath.colorado.edu/courses/5720/2000Spr/Labs/Haar/haar.html 2000. K. Sandberg. The haar wavelet transform. http://amath.colorado.edu/courses/5720/2000Spr/Labs/Haar/haar.html 2000.
C. Shearer . The CRISP-DM model: The new blueprint for data mining . Journal of Data Warehousing , 5 ( 4 ), 2000 . C. Shearer. The CRISP-DM model: The new blueprint for data mining. Journal of Data Warehousing, 5(4), 2000.
H. Shi . Best-first decision tree learning. Master's thesis , University of Waikato , Hamilton, NZ , 2007 . COMP594. H. Shi. Best-first decision tree learning. Master's thesis, University of Waikato, Hamilton, NZ, 2007. COMP594.
K.M. Ting and I.H. Witten . Stacking bagged and dagged models. In D. H. Fisher, editor , Fourteenth international Conference on Machine Learning , pages 367 -- 375 , San Francisco, CA , 1997 . Morgan Kaufmann Publishers. K.M. Ting and I.H. Witten. Stacking bagged and dagged models. In D. H. Fisher, editor, Fourteenth international Conference on Machine Learning, pages 367--375, San Francisco, CA, 1997. Morgan Kaufmann Publishers.
I.H. Witten and E. Frank . Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations . Morgan Kaufmann , San Francisco , 2000 . I.H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco, 2000.
I.H. Witten and E. Frank . Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann , San Francisco , 2 edition, 2005 . I.H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco, 2 edition, 2005.
I.H. Witten , G.W. Paynter , E. Frank , C. Gutwin , and C.G. Nevill-Manning . Kea: Practical automatic keyphrase extraction . In Y.-L . Theng and S. Foo, editors, Design and Usability of Digital Libraries: Case Studies in the Asia Pacific, pages 129 -- 152 . Information Science Publishing , London, 2005 . I.H. Witten, G.W. Paynter, E. Frank, C. Gutwin, and C.G. Nevill-Manning. Kea: Practical automatic keyphrase extraction. In Y.-L. Theng and S. Foo, editors, Design and Usability of Digital Libraries: Case Studies in the Asia Pacific, pages 129--152. Information Science Publishing, London, 2005.
X. Xu . Statistical learning in multiple instance problems. Master's thesis , Department of Computer Science , University of Waikato , 2003 . X. Xu. Statistical learning in multiple instance problems. Master's thesis, Department of Computer Science, University of Waikato, 2003.