A survey on concept drift adaptation
Tóm tắt
Concept drift primarily refers to an online supervised learning scenario when the relation between the input data and the target variable changes over time. Assuming a general knowledge of supervised learning in this article, we characterize adaptive learning processes; categorize existing strategies for handling concept drift; overview the most representative, distinct, and popular techniques and algorithms; discuss evaluation methodology of adaptive algorithms; and present a set of illustrative applications. The survey covers the different facets of concept drift in an integrated way to reflect on the existing scattered state of the art. Thus, it aims at providing a comprehensive introduction to the concept drift adaptation for researchers, industry analysts, and practitioners.
Từ khóa
Tài liệu tham khảo
R. Agrawal , S. P. Ghosh , T. Imielinski , B. R. Iyer , and A. N. Swami . 1992. An Interval Classifier for Database Mining Applications . In Proc. of the 18th Int. Conf. on Very Large Data Bases (VLDB). Morgan Kaufmann, 560--573 . R. Agrawal, S. P. Ghosh, T. Imielinski, B. R. Iyer, and A. N. Swami. 1992. An Interval Classifier for Database Mining Applications. In Proc. of the 18th Int. Conf. on Very Large Data Bases (VLDB). Morgan Kaufmann, 560--573.
K. Bache and M. Lichman. 2013. UCI Machine Learning Repository. Technical Report. University of California Irvine. http://archive.ics.uci.edu/ml. K. Bache and M. Lichman. 2013. UCI Machine Learning Repository. Technical Report. University of California Irvine. http://archive.ics.uci.edu/ml.
M. Basseville and I. Nikiforov. 1993. Detection of Abrupt Changes - Theory and Application. online France. M. Basseville and I. Nikiforov. 1993. Detection of Abrupt Changes - Theory and Application. online France.
A. Bifet and E. Frank . 2010. Sentiment Knowledge Discovery in Twitter Streaming Data . In Proc. of the 13th Int. Conf. on Discovery Science (DS). Springer-Verlag , Berlin, 1--15. A. Bifet and E. Frank. 2010. Sentiment Knowledge Discovery in Twitter Streaming Data. In Proc. of the 13th Int. Conf. on Discovery Science (DS). Springer-Verlag, Berlin, 1--15.
A. Bifet and R. Gavalda . 2007. Learning from Time-Changing Data with Adaptive Windowing . In Proc. of SIAM Int. Conf. on Data Mining (SDM). SIAM, 443--448 . A. Bifet and R. Gavalda. 2007. Learning from Time-Changing Data with Adaptive Windowing. In Proc. of SIAM Int. Conf. on Data Mining (SDM). SIAM, 443--448.
A. Bifet , G. Holmes , R. Kirkby , and B. Pfahringer . 2011 . DATA STREAM MINING: A Practical Approach. Tech. rep . University of Waikato . Retrieved from http://heanet.dl.sourceforge.net/project/moa-datastream/documentation/StreamMining.pdf. A. Bifet, G. Holmes, R. Kirkby, and B. Pfahringer. 2011. DATA STREAM MINING: A Practical Approach. Tech. rep. University of Waikato. Retrieved from http://heanet.dl.sourceforge.net/project/moa-datastream/documentation/StreamMining.pdf.
A. Bifet , G. Holmes , and B. Pfahringer . 2010. Leveraging Bagging for Evolving Data Streams . In Proc. of the Eur. Conf. on Mach. Learn. and Knowledge Discovery in Databases (ECMLPKDD). Springer-Verlag , Berlin, 135--150. A. Bifet, G. Holmes, and B. Pfahringer. 2010. Leveraging Bagging for Evolving Data Streams. In Proc. of the Eur. Conf. on Mach. Learn. and Knowledge Discovery in Databases (ECMLPKDD). Springer-Verlag, Berlin, 135--150.
A. Bifet , G. Holmes , B. Pfahringer , J. Read , P. Kranen , H. Kremer , T. Jansen , and T. Seidl . 2011. MOA: A Real-Time Analytics Open Source Framework . In Proc. Eur. Conf. on Mach. Learn. and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD). Springer-Verlag , Berlin, 617--620. A. Bifet, G. Holmes, B. Pfahringer, J. Read, P. Kranen, H. Kremer, T. Jansen, and T. Seidl. 2011. MOA: A Real-Time Analytics Open Source Framework. In Proc. Eur. Conf. on Mach. Learn. and Principles and Practice of Knowledge Discovery in Databases (ECMLPKDD). Springer-Verlag, Berlin, 617--620.
A. Bouchachia , M. Prossegger , and H. Duman . 2010. Semi-Supervised Incremental Learning . In Proc. of the IEEE Int. Conf. on Fuzzy Systems (FUZZ-IEEE). IEEE, 1--6. A. Bouchachia, M. Prossegger, and H. Duman. 2010. Semi-Supervised Incremental Learning. In Proc. of the IEEE Int. Conf. on Fuzzy Systems (FUZZ-IEEE). IEEE, 1--6.
A. Bouchachia and C. Vanaret . 2013 . GT2FC: An Online Growing Interval Type-2 Self-Learning Fuzzy Classifier . IEEE Trans. Fuzzy Syst. In press. DOI:http://dx.doi.org/10.1109/TFUZZ. 2013 . 2279554 10.1109/TFUZZ.2013.2279554 A. Bouchachia and C. Vanaret. 2013. GT2FC: An Online Growing Interval Type-2 Self-Learning Fuzzy Classifier. IEEE Trans. Fuzzy Syst. In press. DOI:http://dx.doi.org/10.1109/TFUZZ.2013.2279554
L. Breiman and others. 1984. Classification and Regression Trees . Chapman & Hall , New York . L. Breiman and others. 1984. Classification and Regression Trees. Chapman & Hall, New York.
J. Carmona-Cejudo , M. Baena-Garcia , J. del Campo-Avila , R. Bueno , and A. Bifet . 2010. GNUsmail: Open Framework for On-line Email Classification . In Proc. of the 19th Eur. Conf. on Art. Intell. (ECAI). IOS Press, The Netherlands, 1141--1142 . J. Carmona-Cejudo, M. Baena-Garcia, J. del Campo-Avila, R. Bueno, and A. Bifet. 2010. GNUsmail: Open Framework for On-line Email Classification. In Proc. of the 19th Eur. Conf. on Art. Intell. (ECAI). IOS Press, The Netherlands, 1141--1142.
G. Castillo , J. Gama , and A. Breda . 2003. Adaptive Bayes for a Student Modeling Prediction Task Based on Learning Styles . In Proc. of the 9th Int. Conf. on User Modeling (UM). Springer , Berlin, 328--332. G. Castillo, J. Gama, and A. Breda. 2003. Adaptive Bayes for a Student Modeling Prediction Task Based on Learning Styles. In Proc. of the 9th Int. Conf. on User Modeling (UM). Springer, Berlin, 328--332.
N. Cesa-Bianchi and G. Lugosi. 2006. Prediction Learning and Games. Cambridge University Press Cambridge UK. N. Cesa-Bianchi and G. Lugosi. 2006. Prediction Learning and Games. Cambridge University Press Cambridge UK.
F. Chu and C. Zaniolo . 2004. Fast and Light Boosting for Adaptive Mining of Data Streams . In Proc. of the 5th Pac.-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD). Springer-Verlag , Berlin, 282--292. F. Chu and C. Zaniolo. 2004. Fast and Light Boosting for Adaptive Mining of Data Streams. In Proc. of the 5th Pac.-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD). Springer-Verlag, Berlin, 282--292.
T. Dasu , Sh. Krishnan , S. Venkatasubramanian , and K. Yi . 2006. An Information-Theoretic Approach to Detecting Changes in Multi-Dimensional Data Streams . In Proc. of the 38th Symp. on the Interface of Statistics, Computing Science, and Applications. T. Dasu, Sh. Krishnan, S. Venkatasubramanian, and K. Yi. 2006. An Information-Theoretic Approach to Detecting Changes in Multi-Dimensional Data Streams. In Proc. of the 38th Symp. on the Interface of Statistics, Computing Science, and Applications.
R. O. Duda P. E. Hart and D. G. Stork. 2001. Pattern Classification. Wiley. R. O. Duda P. E. Hart and D. G. Stork. 2001. Pattern Classification. Wiley.
J. Gama . 2010. Knowledge Discovery from Data Streams . Chapman & Hall/CRC , London . J. Gama. 2010. Knowledge Discovery from Data Streams. Chapman & Hall/CRC, London.
J. Gama and P. Kosina . 2011. Learning about the Learning Process . In Proc. of the 10th Int. Conf. on Advances in Intelligent Data Analysis (IDA). Springer , Berlin, 162--172. J. Gama and P. Kosina. 2011. Learning about the Learning Process. In Proc. of the 10th Int. Conf. on Advances in Intelligent Data Analysis (IDA). Springer, Berlin, 162--172.
J. Gama , P. Medas , G. Castillo , and P. Rodrigues . 2004. Learning with Drift Detection . In Proc. of the 17th Brazilian Symp. on Artif. Intell. (SBIA). Springer , Berlin, 286--295. J. Gama, P. Medas, G. Castillo, and P. Rodrigues. 2004. Learning with Drift Detection. In Proc. of the 17th Brazilian Symp. on Artif. Intell. (SBIA). Springer, Berlin, 286--295.
J. Gantz and D. Reinsel . 2012 . IDC: The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East. (December 2012). J. Gantz and D. Reinsel. 2012. IDC: The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East. (December 2012).
J. Gao , W. Fan , J. Han , and P. S. Yu . 2007. A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions . In Proc. of the 7th SIAM Int. Conf. on Data Mining (SDM). SIAM, USA. J. Gao, W. Fan, J. Han, and P. S. Yu. 2007. A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions. In Proc. of the 7th SIAM Int. Conf. on Data Mining (SDM). SIAM, USA.
C. Giraud-Carrier . 2000 . A note on the utility of incremental learning . AI Commun. 13 , 4 (Dec. 2000), 215--223. C. Giraud-Carrier. 2000. A note on the utility of incremental learning. AI Commun. 13, 4 (Dec. 2000), 215--223.
A.-M. Grisogono. 2006. The Implications of Complex Adaptive Systems Theory for C2. In State of the Art State of the Practice Vol. CCRTS. Defense Technical Information Center. A.-M. Grisogono. 2006. The Implications of Complex Adaptive Systems Theory for C2. In State of the Art State of the Practice Vol. CCRTS. Defense Technical Information Center.
M. Harries. 1999. SPLICE-2 Comparative Evaluation: Electricity Pricing. Tech. rep. South Wales Univ. M. Harries. 1999. SPLICE-2 Comparative Evaluation: Electricity Pricing. Tech. rep. South Wales Univ.
T. Joachims . 2000 . Estimating the Generalization Performance of an SVM Efficiently . In Proc. of the 17th Int. Conf. on Mach. Learn. (ICML). Morgan Kaufmann Publishers, USA, 431--438 . T. Joachims. 2000. Estimating the Generalization Performance of an SVM Efficiently. In Proc. of the 17th Int. Conf. on Mach. Learn. (ICML). Morgan Kaufmann Publishers, USA, 431--438.
D. Kifer , Sh. Ben-David , and J. Gehrke . 2004. Detecting Change in Data Streams . In Proc. of the 13th Int. Conf. on Very Large Data Bases (VLDB). VLDB Endowment, 180--191 . D. Kifer, Sh. Ben-David, and J. Gehrke. 2004. Detecting Change in Data Streams. In Proc. of the 13th Int. Conf. on Very Large Data Bases (VLDB). VLDB Endowment, 180--191.
R. Klinkenberg . 2003 . Predicting Phases in Business Cycles Under Concept Drift . In Proc. of the Ann. Workshop on Machine Learning of the National German Computer Science Society (LLWA). LLWA, Germany, 3--10 . R. Klinkenberg. 2003. Predicting Phases in Business Cycles Under Concept Drift. In Proc. of the Ann. Workshop on Machine Learning of the National German Computer Science Society (LLWA). LLWA, Germany, 3--10.
R. Klinkenberg and Th. Joachims . 2000 . Detecting Concept Drift with Support Vector Machines . In Proc. of the 17th Int. Conf. on Machine Learning (ICML). Morgan Kaufmann, 487--494 . R. Klinkenberg and Th. Joachims. 2000. Detecting Concept Drift with Support Vector Machines. In Proc. of the 17th Int. Conf. on Machine Learning (ICML). Morgan Kaufmann, 487--494.
R. Klinkenberg and I. Renz . 1998 . Adaptive Information Filtering: Learning in the Presence of Concept Drifts. In Workshop Notes of the ICML/AAAI-98 Workshop on Learning for Text Categorization. AAAI, 33--40 . R. Klinkenberg and I. Renz. 1998. Adaptive Information Filtering: Learning in the Presence of Concept Drifts. In Workshop Notes of the ICML/AAAI-98 Workshop on Learning for Text Categorization. AAAI, 33--40.
J. Kolter and M. Maloof . 2003. Dynamic Weighted Majority: A New Ensemble Method for Tracking Concept Drift . In Proc. of the 3rd IEEE Int. Conf. on Data Mining (ICDM). IEEE, 123--130 . J. Kolter and M. Maloof. 2003. Dynamic Weighted Majority: A New Ensemble Method for Tracking Concept Drift. In Proc. of the 3rd IEEE Int. Conf. on Data Mining (ICDM). IEEE, 123--130.
P. Kosina , J. Gama , and R. Sebastiao . 2010. Drift Severity Metric . In Proc. of the 19th Eur. Conf. on Artificial Intelligence (ECAI). IOS Press, The Netherlands, 1119--1120 . P. Kosina, J. Gama, and R. Sebastiao. 2010. Drift Severity Metric. In Proc. of the 19th Eur. Conf. on Artificial Intelligence (ECAI). IOS Press, The Netherlands, 1119--1120.
I. Koychev . 2000 . Gradual Forgetting for Adaptation to Concept Drift . In Proc. of ECAI Workshop on Current Issues in Spatio-Temporal Reasoning. ECAI, Germany, 101--106 . I. Koychev. 2000. Gradual Forgetting for Adaptation to Concept Drift. In Proc. of ECAI Workshop on Current Issues in Spatio-Temporal Reasoning. ECAI, Germany, 101--106.
L. Kuncheva . 2008 . Classifier Ensembles for Detecting Concept Change in streaming data: Overview and perspectives . In Proc. of the 2nd Workshop SUEMA 2008. SUEMA, online. L. Kuncheva. 2008. Classifier Ensembles for Detecting Concept Change in streaming data: Overview and perspectives. In Proc. of the 2nd Workshop SUEMA 2008. SUEMA, online.
L. I. Kuncheva . 2004. Classifier ensembles for changing environments . In Proc. of the 5th Int. Worksh. on Multiple Classifier Systems (MCS) . Springer , Berlin , 1--15. L. I. Kuncheva. 2004. Classifier ensembles for changing environments. In Proc. of the 5th Int. Worksh. on Multiple Classifier Systems (MCS). Springer, Berlin, 1--15.
L. I. Kuncheva . 2009. Using Control Charts for Detecting Concept Change in Streaming Data. Tech. rep. BCS-TR-001-2009. School of Computer Science , Bangor University, UK . Retrieved from http://www.bangor.ac.uk/∼mas00a/papers/lkTR09.pdf. L. I. Kuncheva. 2009. Using Control Charts for Detecting Concept Change in Streaming Data. Tech. rep. BCS-TR-001-2009. School of Computer Science, Bangor University, UK. Retrieved from http://www.bangor.ac.uk/∼mas00a/papers/lkTR09.pdf.
C. Lanquillon . 2002 . Enhancing Text Classification to Improve Information Filtering . Künstliche Intelligenz , 16 , 2 (2002), 37 -- 38 . C. Lanquillon. 2002. Enhancing Text Classification to Improve Information Filtering. Künstliche Intelligenz, 16, 2 (2002), 37--38.
P. Lindstrom , S. J. Delany , and B. Mac Namee . 2010. Handling Concept Drift in a Text Data Stream Constrained by High Labelling Cost . In Proc. of the 23rd Int. Florida Art. Intell. Research Society Conf. FLAIRS. P. Lindstrom, S. J. Delany, and B. Mac Namee. 2010. Handling Concept Drift in a Text Data Stream Constrained by High Labelling Cost. In Proc. of the 23rd Int. Florida Art. Intell. Research Society Conf. FLAIRS.
M. A. Maloof . 2010. The AQ Methods for Concept Drift . In Advances in Machine Learning I: Dedicated to the Memory of Professor Ryszard S. Michalski . Springer , Berlin , 23--47. M. A. Maloof. 2010. The AQ Methods for Concept Drift. In Advances in Machine Learning I: Dedicated to the Memory of Professor Ryszard S. Michalski. Springer, Berlin, 23--47.
M. A. Maloof and R. S. Michalski . 1995. A Method for Partial-Memory Incremental Learning and Its Application to Computer Intrusion Detection . In Proc. of the 7th IEEE Int. Conf. on Tools with Artif. Intell. IEEE, 392--397 . M. A. Maloof and R. S. Michalski. 1995. A Method for Partial-Memory Incremental Learning and Its Application to Computer Intrusion Detection. In Proc. of the 7th IEEE Int. Conf. on Tools with Artif. Intell. IEEE, 392--397.
M. Mehta , R. Agrawal , and J. Rissanen . 1996. SLIQ: A Fast Scalable Classifier for Data Mining . In Proc. of the 5th Int. Conf. on Extending Database Technol.: Advances in Database Technol. (EDBT). Springer , Berlin, 18--32. M. Mehta, R. Agrawal, and J. Rissanen. 1996. SLIQ: A Fast Scalable Classifier for Data Mining. In Proc. of the 5th Int. Conf. on Extending Database Technol.: Advances in Database Technol. (EDBT). Springer, Berlin, 18--32.
C. Monteiro R. Bessa V. Miranda A. Botterud J. Wang and G. Conzelmann. 2009. Wind Power Forecasting: State-of-the-Art 2009. Tech. rep. ANL/DIS-10-1. Argonne National Laboratory. C. Monteiro R. Bessa V. Miranda A. Botterud J. Wang and G. Conzelmann. 2009. Wind Power Forecasting: State-of-the-Art 2009. Tech. rep. ANL/DIS-10-1. Argonne National Laboratory.
H. Mouss , D. Mouss , N. Mouss , and L. Sefouhi . 2004. Test of Page-Hinkley, an Approach for Fault Detection in an Agro-Alimentary Production System . In Proc. of the Asian Control Conference. IEEE, 815--818 . H. Mouss, D. Mouss, N. Mouss, and L. Sefouhi. 2004. Test of Page-Hinkley, an Approach for Fault Detection in an Agro-Alimentary Production System. In Proc. of the Asian Control Conference. IEEE, 815--818.
W. Ng and M. Dash . 2008. A Test Paradigm for Detecting Changes in Transactional Data Streams . In Proc. of the 13th Int. Conf. on Database Systems for Advanced Applications (DASFAA). Springer , Berlin, 204--219. W. Ng and M. Dash. 2008. A Test Paradigm for Detecting Changes in Transactional Data Streams. In Proc. of the 13th Int. Conf. on Database Systems for Advanced Applications (DASFAA). Springer, Berlin, 204--219.
K. Nishida and K. Yamauchi . 2007. Detecting Concept Drift Using Statistical Testing . In Proc. of the 10th International Conference on Discovery Science (DS'07) . Springer-Verlag, Berlin, 264--269. http://dl.acm.org/citation.cfm?id=1778942.1778972 K. Nishida and K. Yamauchi. 2007. Detecting Concept Drift Using Statistical Testing. In Proc. of the 10th International Conference on Discovery Science (DS'07). Springer-Verlag, Berlin, 264--269. http://dl.acm.org/citation.cfm?id=1778942.1778972
R. Sebastião and J. Gama . 2007. Change Detection in Learning Histograms from Data Streams . In Progress in Artificial Intelligence: Proc. of the Portuguese Conf. on Art. Intell. Springer , Berlin, 112--123. R. Sebastião and J. Gama. 2007. Change Detection in Learning Histograms from Data Streams. In Progress in Artificial Intelligence: Proc. of the Portuguese Conf. on Art. Intell. Springer, Berlin, 112--123.
J. C. Shafer , R. Agrawal , and M. Mehta . 1996. SPRINT: A Scalable Parallel Classifier for Data Mining . In Proc. of the 22th Int. Conf. on Very Large Data Bases (VLDB). Morgan Kaufmann, 544--555 . J. C. Shafer, R. Agrawal, and M. Mehta. 1996. SPRINT: A Scalable Parallel Classifier for Data Mining. In Proc. of the 22th Int. Conf. on Very Large Data Bases (VLDB). Morgan Kaufmann, 544--555.
A. Tsymbal . 2004. The Problem of Concept Drift: Definitions and Related Work. Tech. rep. Department of Computer Science , Trinity College , Dublin . A. Tsymbal. 2004. The Problem of Concept Drift: Definitions and Related Work. Tech. rep. Department of Computer Science, Trinity College, Dublin.
W. M. P. van der Aalst . 2011. Process Mining—Discovery, Conformance and Enhancement of Business Processes . Springer , Berlin . I--XVI, 1--352 pages. W. M. P. van der Aalst. 2011. Process Mining—Discovery, Conformance and Enhancement of Business Processes. Springer, Berlin. I--XVI, 1--352 pages.
A. Wald . 1947. Sequential Analysis . John Wiley and Sons . A. Wald. 1947. Sequential Analysis. John Wiley and Sons.
G. Widmer and M. Kubat . 1993. Effective Learning in Dynamic Environments by Explicit Context Tracking . In Proc. of the Eur. Conf. on Mach. Learn. (ECML). Springer , Berlin, 227--243. G. Widmer and M. Kubat. 1993. Effective Learning in Dynamic Environments by Explicit Context Tracking. In Proc. of the Eur. Conf. on Mach. Learn. (ECML). Springer, Berlin, 227--243.
G. Zeira O. Maimon M. Last and L. Rokach . 2004 . Change Detection in Classification Models Induced from Time-Series Data. In Data Mining in Time Series Databases . Vol. 57 . World Scientific Singapore 101--125. G. Zeira O. Maimon M. Last and L. Rokach. 2004. Change Detection in Classification Models Induced from Time-Series Data. In Data Mining in Time Series Databases. Vol. 57. World Scientific Singapore 101--125.
P. Zhao , S. Hoi , R. Jin , and T. Yang . 2011. Online AUC Maximization . In Proc. of the 28th Int. Conf. on Machine Learning (ICML). Omnipress, 233--240 . P. Zhao, S. Hoi, R. Jin, and T. Yang. 2011. Online AUC Maximization. In Proc. of the 28th Int. Conf. on Machine Learning (ICML). Omnipress, 233--240.
I. Zliobaite . 2009. Learning under Concept Drift: An Overview. Tech. rep . Vilnius University . I. Zliobaite. 2009. Learning under Concept Drift: An Overview. Tech. rep. Vilnius University.