Open-source machine learning: R meets Weka

Computational Statistics - Tập 24 - Trang 225-232 - 2008
Kurt Hornik1, Christian Buchta2, Achim Zeileis1
1Department of Statistics and Mathematics, Wirtschaftsuniversität Wien, Vienna, Austria
2Institute for Tourism and Leisure Studies, Wirtschaftsuniversität Wien, Vienna, Austria

Tóm tắt

Two of the prime open-source environments available for machine/statistical learning in data mining and knowledge discovery are the software packages Weka and R which have emerged from the machine learning and statistics communities, respectively. To make the different sets of tools from both environments available in a single unified system, an R package RWeka is suggested which interfaces Weka’s functionality to R. With only a thin layer of (mostly R) code, a set of general interface generators is provided which can set up interface functions with the usual “R look and feel”, re-using Weka’s standardized interface of learner classes (including classifiers, clusterers, associators, filters, loaders, savers, and stemmers) with associated methods.

Tài liệu tham khảo

Carey V (2007) arji: Another R-Java interface. http://www.bioconductor.org/, R package version 0.3.16 Chambers JM, Hastie TJ (1992) Statistical models in S. Chapman & Hall, London Ellson J, Gansner E, Koutsofios E, North S, Woodhull G (2003) Graphviz and Dynagraph—static and dynamic graph drawing tools. In: Junger M, Mutzel P (eds.) Graph drawing software. Springer, Heidelberg, pp 127–148. http://www.Graphviz.org/ Gentry J, Long L, Gentleman R, Falcon S (2007) Rgraphviz: plotting capabilities for R graph objects. http://www.bioconductor.org/, R package version 1.14.1 Hahsler M, Grün B, Hornik K (2005) arules—A computational environment for mining association rules and frequent item sets. J Stat Softw 14(15):1–25. ISSN 1548-7660, http://www.jstatsoft.org/v14/i15/ Hornik K, Zeileis A, Hothorn T, Buchta C (2007) RWeka: an R interface to Weka. http://CRAN.R-project.org/package=RWeka, R package version 0.3-4 Hothorn T, Hornik K, Zeileis A (2006) Unbiased recursive partitioning: a conditional inference framework. J Comput Graphical Stat 15(3): 651–674 R Development Core Team (2007) R: a language and environment for statistical computing. R foundation for statistical computing, Vienna, Austria, http://www.R-project.org/, ISBN 3-900051-07-0 Schauerhuber M, Zeileis A, Meyer D, Hornik K (2007) Benchmarking open-source tree learners in R/RWeka. In: Data analysis, machine learning, and applications (Proceedings of the 31st annual conference of the Gesellschaft für Klassifikation e.V., March 7–9, 2007, Freiburg), forthcoming Temple Lang D, Chambers J (2005) SJava: The omegahat interface for R and Java. http://www.omegahat.org/RSJava/, R package version 0.69-0 Urbanek S (2007) rJava: Low-Level R to Java interface. http://CRAN.R-project.org/package=rJava, R package version 0.4-16 Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco