Partially labeled data stream classification with the semi-supervised K-associated graph

Springer Science and Business Media LLC - Tập 18 Số 4 - Trang 299-310 - 2012
João Roberto Bertini1, Alneu de Andrade Lopes1, Liang Zhao1
1Instituto de Ciências Matemáticas e de Computação, USP, São Carlos, Brazil

Tóm tắt

Abstract Regular data classification techniques are based mainly on two strong assumptions: (1) the existence of a reasonably large labeled set of data to be used in training; and (2) future input data instances conform to the distribution of the training set, i.e. data distribution is stationary along time. However, in the case of data stream classification, both of the aforementioned assumptions are difficult to satisfy. In this paper, we present a graph-based semi-supervised approach that extends the static classifier based on the K-associated Optimal Graph to perform online semi-supervised classification tasks. In order to learn from labeled and unlabeled patterns, here we adapt the optimal graph construction to simultaneously spread the labels in the training set. The sparse, disconnected nature of the proposed graph structure gives flexibility to cope with non-stationary classification. Experimental comparison between the proposed method and three state-of-the-art ensemble classification methods is provided and promising results have been obtained.

Từ khóa


Tài liệu tham khảo

Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15:1373–1396

Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 1:1–48

Bertini JR Jr, Lopes A, Motta R, Zhao L (2010) Online classifier based on the optimal K-associated network. In: Proceedings of the joint conference, III international workshop on web and text intelligence (WTI’10), pp 826–835

Bertini JR Jr, Zhao L, Motta R, Lopes A (2011) A nonparametric classification method based on K-associated graphs. Inf Sci 181:5435–5456

Bornholdt S, Schuster H (eds) (2003) Handbook of graphs and networks: from the genome to the Internet, 1st edn. Wiley-VCH, Weinheim

Breve FA, Zhao L, Quiles M, Pedrycz W, Liu J (2011) Particle competition and cooperation in networks for semi-supervised learning. IEEE Trans Knowl Data Eng. doi:10.1109/TKDE.2011.119

Chapelle O, Zien A, Schölkopf B (eds) (2006) Semi-supervised learning, 1st edn. MIT Press, Cambridge

Chapelle O, Sindhwani V, Keerthi S (2008) Optimization techniques for semi-supervised support vector machines. J Mach Learn Res 9:203–233

Cormen T, Leiserson C, Rivest R, Stein C (2009) Introduction to algorithms, 3rd edn. MIT Press, Cambridge

Culp M, Michailidis G (2008) Graph-based semisupervised learning. IEEE Trans Pattern Anal Mach Intell 30(1):174–179

Ditzler G, Polikar R (2011) Semi-supervised learning in nonstationary environments. In: Proceedings of international joint conference on neural networks (IJCNN’11), San Jose, CA, USA. IEEE Press, New York, pp 2741–2748

Erman J, Mahanti A, Arlitt M, Cohen I, Williamson C (2007) Offline/realtime traffic classification using semi-supervised learning. Perform Eval 64:1194–1213

Gama J, Medas P, Castillo G, Rodrigues P (2004) Learning with drift detection. In: Proceedings of the Brazilian symposium on artificial intelligence (SBIA’04), vol 3171. Springer, Berlin, pp 286–295

Giraud-Carrier C (2000) A note on the utility of incremental learning. AI Commun 13(4):215–223

Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference and prediction, 2nd edn. Springer, Berlin

Hettich S, Bay S (1999) The UCI KDD archive. University of California, Irvine, School of Information and Computer Sciences. http://kdd.ics.uci.edu/

Kelly M, Hand D, Adams N (1999) The impact of changing populations on classifier performance. In: Proceedings of the international conference on knowledge discovery and data mining (KDD’99). ACM, New York, pp 367–371

Klinkenberg R, Joachims T (2000) Detecting concept drift with support vector machines. In: Proceedings of the international conference on machine learning (ICML’00). Morgan Kaufmann, San Mateo, pp 487–494

Kolter JZ, Maloof MA (2007) Dynamic weighted majority: an ensemble method for drifting concepts. J Mach Learn Res 8:2755–2790

Li P, Wu X, Hu X (2010) Mining recurring concept drift with limited labeled streaming data. In: JLMR: workshop and conference proceedings, vol 13, pp 241–252

Lopes AA, Bertini JR Jr, Motta R, Zhao L (2009) Classification based on the optimal k-associated network. In: Proceedings of the international conference on complex sciences: theory and applications (COMPLEX’09). Lecture notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering (LNICST), vol 4. Springer, Berlin, pp 1167–1177

Masud M, Gao J, Khan L, Han J (2008) A practical approach to classify evolving data streams: training with limited amount of labeled data. In: Proceeding of the international conference on data mining (ICDM’08)

Minku L, White A, Yao X (2010) The impact of diversity on online ensemble learning in the presence of concept drift. IEEE Trans Knowl Data Eng 22:730–742

Narasimhamurthy A, Kuncheva L (2007) A framework for generating data to simulate changing environments. In: Proceedings of the international artificial intelligence and applications (ICAIA’07), pp 384–389

Quiles M, Zhao L, Alonso RL, Romero RAF (2008) Particle competition for complex network community detection. Chaos 18:033107

Quinlan JR (1993) C4.5 programs for machine learning, 1st edn. Morgan Kaufmann, San Mateo

Schaeffer S (2007) Graph clustering. Comput Sci Rev 1:27–34

Schlimmer J, Granger R (1986) Beyond incremental processing: tracking concept drift. In: Proceedings of the association for the advancement of artificial intelligence (AAAI’86). AAAI Press, Menlo Park, pp 502–507

Street N, Kim Y (2001) A streaming ensemble algorithm (SEA) for large-scale classification. In: Proc int’l conf knowledge discovery and data mining (KDD’01). ACM, New York, pp 377–382

Sung J, Kim D (2009) Adaptive active appearance model with incremental learning. Pattern Recognit Lett 30:359–367

Syed N, Liu H, Sung K (1999) Handling concept drift in incremental learning with support vector machines. In: Proceedings of the international conference on knowledge discovery and data mining (KDD’99), pp 272–276

von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17:395–416

Wang H, Fan W, Yu P, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proc international conference on knowledge discovery and data mining (KDD’03), pp 226–235

Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1):69–101

Yang C, Zhou J (2008) Non-stationary data sequence classification using online class priors estimation. Pattern Recognit 41:2656–2664

Yu Y, Guo S, Lan S, Ban T (2008) Anomaly intrusion detection for evolving data stream based on semi-supervised learning. In: Proceedings of the international conference on advances in neuro-information processing (NIPS’08), pp 571–578

Zhang P, Zhu X, Guo L (2009) Mining data streams with labeled and unlabeled training examples. In: Proceedings of the ninth IEEE international conference on data mining (ICDM’09). IEEE Press, New York, pp 627–636

Zhu X (2008) Semi-supervised learning literature survey. Tech Rep 1530, Computer-Science, University of Wisconsin-Madison

Zhu X (2005) Semi-supervised learning with graphs. Tech Rep Doctoral Thesis, School of Computer Science, Carnegie Mellon University