Learning from streaming data with concept drift and imbalance: an overview

Progress in Artificial Intelligence - Tập 1 Số 1 - Trang 89-101 - 2012
T. Ryan Hoens1, Robi Polikar2, Nitesh V. Chawla1
1University of Notre Dame.
2Rowan University

Tóm tắt

Từ khóa


Tài liệu tham khảo

Alippi, C., Boracchi, G., Roveri, M.: Just in time classifiers: managing the slow drift case. In: IJCNN, pp. 114–120. IEEE, New York (2009). doi: 10.1109/IJCNN.2009.5178799

Alippi, C., Roveri, M.: Just-in-time adaptive classifiers in non-stationary conditions. In: IJCNN, pp. 1014–1019. IEEE, New York (2007)

Alippi C., Roveri M.: Just-in-time adaptive classifierspart ii: designing the classifier. TNN 19(12), 2053–2064 (2008)

Andres-Andres, A., Gomez-Sanchez, E., Bote-Lorenzo, M.: Incremental rule pruning for fuzzy artmap neural network. In: ICANN, pp. 655–660 (2005)

Becker, H., Arias, M.: Real-time ranking with concept drift using expert advice. In: KDD, pp. 86–94. ACM, New York (2007)

Bifet, A., Gavalda, R.: Learning from time-changing data with adaptive windowing. In: SDM, pp. 443–448 (Citeseer) (2007)

Bifet, A., Gavalda, R.: Adaptive learning from evolving data streams. In: IDA, pp. 249–260 (2009)

Bifet, A., Holmes, G., Pfahringer, B., Kirkby, R., Gavaldà, R.: New ensemble methods for evolving data streams. In: KDD, pp. 139–148. ACM, New York (2009)

Black M., Hickey R.: Learning classification rules for telecom customer call data under concept drift. Soft Comput. Fusion Found. Methodol. Appl. 8(2), 102–108 (2003)

Breiman L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996). doi: 10.1023/A:1018054314350

Breiman L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). doi: 10.1023/A:1010933404324

Buntine W.: Learning classification trees. Stat. Comput. 2(2), 63–73 (1992)

Carpenter G., Grossberg S., Markuzon N., Reynolds J., Rosen D.: Fuzzy artmap: a neural network architecture for incremental supervised learning of analog multidimensional maps. TNN 3(5), 698–713 (1992)

Carpenter G., Grossberg S., Reynolds J.: Artmap: supervised real-time learning and classification of nonstationary data by a self-organizing neural network. Neural Netw. 4(5), 565–588 (1991)

Carpenter G., Tan A.: Rule extraction: from neural architecture to symbolic representation. Connect. Sci. 7(1), 3–27 (1995)

Chawla N., Japkowicz N., Kotcz A.: Editorial: special issue on learning from imbalanced data sets. ACM SIGKDD Explor. Newsl. 6(1), 1–6 (2004)

Chawla, N., Lazarevic, A., Hall, L., Bowyer, K.: Smoteboost: improving prediction of the minority class in boosting. In: PKDD, pp. 107–119 (2003)

Chawla, N.V.: Data mining for imbalanced datasets: an overview. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 875–886. Springer, Berlin (2010)

Chawla N.V., Cieslak D.A., Hall L.O., Joshi A.: Automatically countering imbalance and its empirical relationship to cost. DMKD 17(2), 225–252 (2008)

Chen, S., He, H.: Sera: selectively recursive approach towards nonstationary imbalanced stream data mining. In: IJCNN, pp. 522–529. IEEE, New York (2009)

Chu, F., Zaniolo, C.: Fast and light boosting for adaptive mining of data streams. In: PAKDD, pp. 282–292 (2004)

Dietterich, T.: Ensemble methods in machine learning. In: MCS, pp. 1–15 (2000)

Ditzler, G., Polikar, R.: An incremental learning framework for concept drift and class imbalance. In: IJCNN. IEEE, New York (2010)

Ditzler, G., Polikar, R., Chawla, N.V.: An incremental learning algorithm for nonstationary environments and class imbalance. In: ICPR. IEEE, New York (2010)

Domingos, P., Hulten, G.: Mining high-speed data streams. In: KDD, pp. 71–80. ACM, New York (2000)

Elwell, R., Polikar, R.: Incremental learning in nonstationary environments with controlled forgetting. In: IJCNN, pp. 771–778. IEEE, New York (2009)

Elwell, R., Polikar, R.: Incremental learning of variable rate concept drift. In: MCS, pp. 142–151 (2009)

Elwell R., Polikar R.: Incremental learning of concept drift in nonstationary environments. TNN 22(10), 1517–1531 (2011)

Fan, W.: Systematic data selection to mine concept-drifting data streams. In: KDD, pp. 128–137. ACM, New York (2004)

Freund, Y., Schapire, R.: Experiments with a new boosting algorithm. In: ICML (1996). doi: 10.1007/3-540-59119-2_166

Friedman J., Hastie T., Tibshirani R.: Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann Stat. 28(2), 337–407 (2000)

Fu L.: Incremental knowledge acquisition in supervised learning networks. SMC Part A 26(6), 801–809 (2002)

Fukunaga K., Hostetler L.: Optimization of k nearest neighbor density estimates. Inf. Theory 19(3), 320–326 (2002)

Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In: AAI, pp. 66–112 (2004)

Gao J., Ding B., Fan W., Han J., Yu P.: Classifying data streams with skewed class distributions and concept drifts. Internet Comput. 12(6), 37–49 (2008)

Gao, J., Fan, W., Han, J., Yu, P.: A general framework for mining concept-drifting data streams with skewed distributions. In: SDM, pp. 3–14 (Citeseer) (2007)

Giraud-Carrier C.: A note on the utility of incremental learning. AI Commun. 13(4), 215–223 (2000)

Grossberg S.: Nonlinear neural networks: principles, mechanisms, and architectures. Neural Netw. 1(1), 17–61 (1988)

Guo, H., Viktor, H.L.: Learning from imbalanced data sets with boosting and data generation: the databoost-im approach. SIGKDD Explor. Newsl. 6, 30–39 (2004). doi: 10.1145/1007730.1007736

Ho T.: The random subspace method for constructing decision forests. PAMI 20(8), 832–844 (1998)

Hoeffding W.: Probability inequalities for sums of bounded random variables. JASA 58(301), 13–30 (1963)

Hoeglinger, S., Pears, R.: Use of hoeffding trees in concept based data stream mining. In: ICIAFS, pp. 57–62 (2007). doi: 10.1109/ICIAFS.2007.4544780

Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: KDD, pp. 97–106. ACM, New York (2001)

Joachims, T.: Estimating the generalization performance of an svm efficiently. In: ICML, p. 431. Morgan Kaufmann, Menlo Park (2000)

Karnick, M., Ahiskali, M., Muhlbaier, M., Polikar, R.: Learning concept drift in nonstationary environments using an ensemble of classifiers based approach. In: IJCNN, pp. 3455–3462. IEEE, New York (2008)

Karnick, M., Muhlbaier, M., Polikar, R.: Incremental learning in non-stationary environments with concept drift using a multiple classifier based approach. In: ICPR, pp. 1–4. IEEE, New York (2009)

Kelly, M., Hand, D., Adams, N.: The impact of changing populations on classifier performance. In: KDD, pp. 367–371. ACM, New York (1999)

Klinkenberg, R., Joachims, T.: Detecting concept drift with support vector machines. In: ICML (Citeseer) (2000)

Kohavi, R., Kunz, C.: Option decision trees with majority votes. In: ICML, pp. 161–169. Morgan Kaufmann, Menlo Park (1997)

Kolter, J., Maloof, M.: Dynamic weighted majority: a new ensemble method for tracking concept drift. In: ICDM, pp. 123–130. IEEE, New York (2003)

Kolter, J., Maloof, M.: Using additive expert ensembles to cope with concept drift. In: ICML, pp. 449–456. ACM, New York (2005)

Kolter J., Maloof M.: Dynamic weighted majority: an ensemble method for drifting concepts. JMLR 8, 2755–2790 (2007)

Kubat M.: Floating approximation in time-varying knowledge bases. PRL 10(4), 223–227 (1989)

Kuncheva L.I., Whitaker C.J.: Measures of diversity in classifier ensembles. Mach. Learn. 51, 181–207 (2003)

Lange S., Grieser G.: On the power of incremental learning. TCS 288(2), 277–307 (2002)

Lange S., Zilles, S.: Formal models of incremental learning and their analysis. In: IJCNN, vol. 4, pp. 2691–2696. IEEE, New York (2003)

Last M.: Online classification of nonstationary data streams. IDA 6(2), 129–147 (2002)

Lazarescu M., Venkatesh S., Bui H.: Using multiple windows to track concept drift. IDA 8(1), 29–59 (2004)

Lichtenwalter, R., Chawla, N.V.: Adaptive methods for classification in arbitrarily imbalanced and drifting data streams. In: New Frontiers in Applied Data Mining. Lecture Notes in Computer Science, vol. 5669, pp. 53–75. Springer, Berlin (2010)

Maron, O., Moore, A.W.: Hoeffding races: accelerating model selection search for classification and function approximation. In: NIPS, pp. 59–66 (1993)

Masnadi-Shirazi H., Vasconcelos N.: Cost-sensitive boosting. PAMI 33(2), 294–309 (2011). doi: 10.1109/TPAMI.2010.71

Mitchell T., Caruana R., Freitag D., McDermott J., Zabowski D.: Experience with a learning personal assistant. Commun. ACM 37(7), 80–91 (1994)

Moreno-Torres, J., Herrera, F.: A preliminary study on overlapping and data fracture in imbalanced domains by means of genetic programming-based feature extraction. In: ISDA, pp. 501 –506 (2010). doi: 10.1109/ISDA.2010.5687214

Moreno-Torres, J., Raeder, T., Alaiz-Rodríguez, R., Chawla, N.V., Herrera, F.: A unifying view on dataset shift in classification. Pattern Recognit. 45, 521–530 (2011)

Muhlbaier, M., Polikar, R.: An ensemble approach for incremental learning in nonstationary environments. In: MCS, pp. 490–500 (2007)

Muhlbaier, M., Polikar, R.: Multiple classifiers based incremental learning algorithm for learning in nonstationary environments. In: ICMLC, vol. 6, pp. 3618–3623. IEEE, New York (2007)

Muhlbaier M., Topalis A., Polikar R.: Learn++. nc: combining ensemble of classifiers with dynamically weighted consult-and-vote for efficient incremental learning of new classes. TNN 20(1), 152–168 (2009). doi: 10.1109/TNN.2008.2008326

Nishida, K., Yamauchi, K., Omori, T.: Ace: adaptive classifiers-ensemble system for concept-drifting environments. In: MCS, pp. 176–185 (2005)

Pfahringer, B., Holmes, G., Kirkby, R.: New options for hoeffding trees. In: AAI, pp. 90–99 (2007)

Polikar R.: Ensemble based systems in decision making. Circuits Syst. Mag. 6(3), 21–45 (2006)

Polikar R.: Bootstrap-inspired techniques in computation intelligence. Signal Process. Mag. 24(4), 59–72 (2007)

Polikar, R., Upda, L., Upda, S.S., Honavar, V.: Learn++: an incremental learning algorithm for supervised neural networks. In: SMC Part C, pp. 497–508 (2001)

Quinlan, J.: C4.5: Programs For Machine Learning. Morgan Kaufmann, Menlo Park (1993)

Schapire R., Singer Y.: Improved boosting algorithms using confidence-rated predictions. Mach. Learn. 37(3), 297–336 (1999)

Scholz M., Klinkenberg R.: Boosting classifiers for drifting concepts. IDA 11(1), 3–28 (2007)

Stanley, K.: Learning concept drift with a committee of decision trees. Technical Report AI-03-302, Computer Science Department, University of Texas-Austin (2003)

Street, W., Kim, Y.: A streaming ensemble algorithm (SEA) for large-scale classification. In: KDD, pp. 377–382. ACM, New York (2001)

Ting, K.: A comparative study of cost-sensitive boosting algorithms. In: ICML (Citeseer) (2000)

Tsymbal, A.: The problem of concept drift: definitions and related work. Technical Report TCD-CS-2004-15, Departament of Computer Science, Trinity College (2004). https://www.cs.tcd.ie/publications/techreports/reports

Tsymbal, A., Pechenizkiy, M., Cunningham, P., Puuronen, S.: Handling local concept drift with dynamic integration of classifiers: domain of antibiotic resistance in nosocomial infections. In: CBMS, pp. 679 –684 (2006). doi: 10.1109/CBMS.2006.94

Tsymbal A., Pechenizkiy M., Cunningham P., Puuronen S.: Dynamic integration of classifiers for handling concept drift. Inf. Fusion 9(1), 56–68 (2008)

Wang, H., Fan, W., Yu, P., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: KDD, pp. 226–235. ACM, New York (2003)

Wang, H., Yin, J., Pei, J., Yu, P., Yu, J.: Suppressing model overfitting in mining concept-drifting data streams. In: KDD, pp. 736–741. ACM, New York (2006)

Widmer, G., Kubat, M.: Learning flexible concepts from streams of examples: Flora2. In: ECAI, p. 467. Wiley, New York (1992)

Widmer, G., Kubat, M.: Effective learning in dynamic environments by explicit context tracking. In: ECML, pp. 227–243. Springer, Berlin (1993)

Widmer G., Kubat M.: Learning in the presence of concept drift and hidden contexts. Mach. Learn. 23(1), 69–101 (1996)