A survey on semi-supervised learning

Machine Learning - Tập 109 Số 2 - Trang 373-440 - 2020
Jesper E. van Engelen1, Holger H. Hoos1
1Leiden Institute of Advanced Computer Science, Leiden University, Leiden, The Netherlands

Tóm tắt

Abstract

Semi-supervised learning is the branch of machine learning concerned with using labelled as well as unlabelled data to perform certain learning tasks. Conceptually situated between supervised and unsupervised learning, it permits harnessing the large amounts of unlabelled data available in many use cases in combination with typically smaller sets of labelled data. In recent years, research in this area has followed the general trends observed in machine learning, with much attention directed at neural network-based models and generative learning. The literature on the topic has also expanded in volume and scope, now encompassing a broad spectrum of theory, algorithms and applications. However, no recent surveys exist to collect and organize this knowledge, impeding the ability of researchers and engineers alike to utilize it. Filling this void, we present an up-to-date overview of semi-supervised learning methods, covering earlier work as well as more recent advances. We focus primarily on semi-supervised classification, where the large majority of semi-supervised learning research takes place. Our survey aims to provide researchers and practitioners new to the field as well as more advanced readers with a solid understanding of the main approaches and algorithms developed over the past two decades, with an emphasis on the most prominent and currently relevant work. Furthermore, we propose a new taxonomy of semi-supervised classification algorithms, which sheds light on the different conceptual and methodological approaches for incorporating unlabelled data into the training process. Lastly, we show how the fundamental assumptions underlying most semi-supervised learning algorithms are closely connected to each other, and how they relate to the well-known semi-supervised clustering assumption.

Từ khóa


Tài liệu tham khảo

Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., & Isard, M., et al. (2016). Tensorflow: A system for large-scale machine learning. In 12th USENIX symposium on operating systems design and implementation (OSDI 16) (pp. 265–283).

Abney, S. (2002). Bootstrapping. In Proceedings of the 40th annual meeting on association for computational linguistics, association for computational linguistics (pp. 360–367).

Anderberg, M. R. (1973). Cluster analysis for applications. Cambridge: Academic Press.

Azran, A. (2007). The rendezvous algorithm: Multiclass semi-supervised learning with Markov random walks. In Proceedings of the 24th international conference on machine learning (pp. 49–56).

Bachman, P., Alsharif, O., & Precup, D. (2014). Learning with pseudo-ensembles. In Advances in neural information processing systems (pp. 3365–3373).

Bair, E. (2013). Semi-supervised clustering methods. Wiley Interdisciplinary Reviews: Computational Statistics, 5(5), 349–361.

Balcan, M. F., Blum, A., & Yang, K. (2005). Co-training and expansion: Towards bridging theory and practice. In Advances in neural information processing systems (pp. 89–96).

Baluja, S., Seth, R., Sivakumar, D., Jing, Y., Yagnik, J., Kumar, S., Ravichandran, D., & Aly, M. (2008). Video suggestion and discovery for youtube: Taking random walks through the view graph. In Proceedings of the 17th international conference on world wide web (pp. 895–904). ACM.

Barabási, A. L. (2016). Network science. Cambridge: Cambridge University Press.

Basu, S., Banerjee, A., & Mooney, R. (2002). Semi-supervised clustering by seeding. In Proceedings of the 19th international conference on machine learning (pp. 27–34).

Belkin, M., Matveeva, I., & Niyogi, P. (2004). Regularization and semi-supervised learning on large graphs. In Proceedings of the international conference on computational learning theory (pp. 624–638). Springer.

Belkin, M., Niyogi, P., & Sindhwani, V. (2005). On manifold regularization. In Proceedings of the 10th international conference on artificial intelligence and statistics (pp. 17–24).

Belkin, M., Niyogi, P., & Sindhwani, V. (2006). Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research, 7, 2399–2434.

Ben-David, S., Lu, T., Pál, D., & Sotáková, M. (2009). Learning low density separators. In Proceedings of the 12th international conference on artificial intelligence and statistics (pp. 25–32).

Bengio, Y., Delalleau, O., & Le Roux, N. (2006). Chapter 11. Label propagation and quadratic criterion. In O. Chapelle, B. Schölkopf, & A. Zien (Eds.), Semi-supervised learning (pp. 193–216). Cambridge: The MIT Press.

Bennett, K. P., & Demiriz, A. (1999). Semi-supervised support vector machines. In Advances in neural information processing systems (pp. 368–374).

Bennett, K. P., Demiriz, A., & Maclin, R. (2002). Exploiting unlabeled data in ensemble methods. In Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 289–296). ACM.

Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., & Raffel, C. (2019). Mixmatch: A holistic approach to semi-supervised learning. arXiv:1905.02249.

Bishop, C. M. (2006). Pattern recognition and machine learning. Berlin: Springer.

Blum, A., & Chawla, S. (2001). Learning from labeled and unlabeled data using graph mincuts. In Proceedings of the 18th international conference on machine learning (pp. 19–26).

Blum, A., Lafferty, J., Rwebangira, M. R., & Reddy, R. (2004). Semi-supervised learning using randomized mincuts. In Proceedings of the 21st international conference on machine learning (p. 13).

Blum, A., & Mitchell, T. (1998). Combining labeled and unlabeled data with co-training. In Proceedings of the 11th annual conference on computational learning theory (pp. 92–100). ACM.

Bruna, J., Zaremba, W., Szlam, A., & LeCun, Y. (2014). Spectral networks and locally connected networks on graphs. In International conference on learning, representations.

Chapelle, O., Chi, M., & Zien, A. (2006a). A continuation method for semi-supervised SVMs. In Proceedings of the 23rd international conference on machine learning (pp. 185–192).

Chapelle, O., Schölkopf, B., & Zien, A. (2006b). Semi-supervised learning (1st ed.). Cambridge: The MIT Press.

Chapelle, O., Sindhwani, V., & Keerthi, S. S. (2008). Optimization techniques for semi-supervised support vector machines. Journal of Machine Learning Research, 9, 203–233.

Chapelle, O., & Zien, A. (2005). Semi-supervised classification by low density separation. In Proceedings of the 10th international workshop on artificial intelligence and statistics (pp. 57–64).

Chen, K., & Wang, S. (2011). Semi-supervised learning via regularized boosting working on multiple semi-supervised assumptions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1), 129–143.

Chen, M., Chen, Y., & Weinberger, K. Q. (2011). Automatic feature decomposition for single view co-training. In Proceedings of the 28th international conference on machine learning (pp. 953–960).

Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 785–794). ACM.

Christoudias, C. M., Urtasun, R., Kapoorz, A., & Darrell, T. (2009). Co-training with noisy perceptual observations. In Proceedings of the 2009 IEEE conference on computer vision and pattern recognition (pp. 2844–2851). IEEE.

Collobert, R., Sinz, F., Weston, J., & Bottou, L. (2006). Large scale transductive SVMs. Journal of Machine Learning Research, 7, 1687–1712.

Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12, 2493–2537.

Corduneanu, A., & Jaakkola, T. (2003). On information regularization. In Proceedings of the 19th conference on uncertainty in artificial intelligence (pp. 151–158). Morgan Kaufmann Publishers Inc.

Cortes, C., & Mohri, M. (2007). On transductive regression. In Advances in neural information processing systems (pp. 305–312).

Cozman, F. G., Cohen, I., & Cirelo, M. C. (2003) Semi-supervised learning of mixture models. In Proceedings of the 20th international conference on machine learning (pp. 99–106).

Culp, M., & Michailidis, G. (2008). An iterative algorithm for extending learners to a semi-supervised setting. Journal of Computational and Graphical Statistics, 17(3), 545–571.

Dai, Z., Yang, Z., Yang, F., Cohen, W. W., & Salakhutdinov, R.R. (2017). Good semi-supervised learning that requires a bad gan. In Advances in neural information processing systems (pp. 6510–6520).

d’Alché Buc, F., Grandvalet, Y., & Ambroise, C. (2002). Semi-supervised marginboost. Advances in Neural Information Processing Systems, 1, 553–560.

Dara, R., Kremer, S. C., & Stacey, D. A. (2002). Clustering unlabeled data with SOMs improves classification of labeled real-world data. In Proceedings of the international joint conference on neural networks (Vol. 3, pp. 2237–2242). IEEE.

Dasgupta, S., Littman, M. L., & McAllester, D. A. (2002). PAC generalization bounds for co-training. In Advances in neural information processing systems (pp. 375–382).

de Bie, T., & Cristianini, N. (2004). Convex methods for transduction. In Advances in neural information processing systems (pp. 73–80).

de Bie, T., & Cristianini, N. (2006). Semi-supervised learning using semi-definite programming. In O. Chapelle, B. Schölkopf, & A. Zien (Eds.), Semi-supervised learning (pp. 119–135). Cambridge: The MIT Press.

de Sousa, C. A. R., Rezende, S. O., & Batista, G. E. (2013) Influence of graph construction on semi-supervised learning. In Proceedings of the joint European conference on machine learning and knowledge discovery in databases (pp. 160–175). Springer.

Demiriz, A., Bennett, K. P., & Embrechts, M. J. (1999). Semi-supervised clustering using genetic algorithms. In Artificial Neural Networks in Engineering (pp. 809–814).

Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal statistical society, Series B, 39, 1–38.

Deng, C., & Zu Guo, M. (2011). A new co-training-style random forest for computer aided diagnosis. Journal of Intelligent Information Systems, 36(3), 253–281.

Denis, F., Gilleron, R., & Letouzey, F. (2005). Learning from positive and unlabeled examples. Theoretical Computer Science, 348(1), 70–83.

Doersch, C. (2016). Tutorial on variational autoencoders. arXiv:1606.05908.

Dópido, I., Li, J., Marpu, P. R., Plaza, A., Dias, J. M. B., & Benediktsson, J. A. (2013). Semisupervised self-learning for hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 51(7), 4032–4044.

Du, J., Ling, C. X., & Zhou, Z. H. (2011). When does cotraining work in real data? IEEE Transactions on Knowledge and Data Engineering, 23(5), 788–799.

Dua, D., & Graff, C. (2019). UCI machine learning repository. Retrieved September 12, 2019 from http://archive.ics.uci.edu/ml.

Duvenaud, D. K., Maclaurin, D., Iparraguirre, J., Bombarell, R., Hirzel, T., Aspuru-Guzik, A., & Adams, R. P. (2015). Convolutional networks on graphs for learning molecular fingerprints. In Advances in neural information processing systems (pp. 2224–2232).

Elkan, C., & Noto, K. (2008). Learning classifiers from only positive and unlabeled data. In Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 213–220). ACM.

Elsken, T., Metzen, J. H., & Hutter, F. (2019). Neural architecture search: A survey. Journal of Machine Learning Research, 20(55), 1–21.

Erhan, D., Bengio, Y., Courville, A., Manzagol, P. A., Vincent, P., & Bengio, S. (2010). Why does unsupervised pre-training help deep learning? Journal of Machine Learning Research, 11, 625–660.

Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., & Hutter, F. (2015). Efficient and robust automated machine learning. In Advances in neural information processing systems (pp. 2962–2970).

Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139.

Geng, B., Tao, D., Xu, C., Yang, L., & Hua, X. S. (2012). Ensemble manifold regularization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(6), 1227–1233.

Goldberg, A. B., Zhu, X., Singh, A., Xu, Z., & Nowak, R. D. (2009). Multi-manifold semi-supervised learning. In Proceedings of the 12th international conference on artificial intelligence and statistics (pp. 169–176).

Goldman, S., & Zhou, Y. (2000) Enhancing supervised learning with unlabeled data. In Proceedings of the 17th international conference on machine learning (pp. 327–334).

Goodfellow, I. (2017). NIPS 2016 tutorial: Generative adversarial networks. arXiv:1701.00160.

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. Cambridge: The MIT Press.

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014a). Generative adversarial nets. In Advances in neural information processing systems (pp. 2672–2680).

Goodfellow, I., Shlens, J., & Szegedy, C. (2014b). Explaining and harnessing adversarial examples. arXiv:1412.6572.

Grabner, H., Leistner, C., Bischof, H. (2008). Semi-supervised on-line boosting for robust tracking. Proceedings of the 10th European conference on computer vision (pp. 234–247).

Grandvalet, Y., & Bengio, Y. (2005). Semi-supervised learning by entropy minimization. In Advances in neural information processing systems (pp. 529–536).

Grandvalet, Y., d’Alché Buc, F., & Ambroise, C. (2001). Boosting mixture models for semi-supervised learning. International conference on artificial neural networks (pp. 41–48).

Grira, N., Crucianu, M., & Boujemaa, N. (2004). Unsupervised and semisupervised clustering: A brief survey. In 7th ACM SIGMM international workshop on multimedia information retrieval.

Grover, A., & Leskovec, J. (2016). node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 855–864). ACM.

Guyon, I., & Elisseeff, A. (2006). An introduction to feature extraction. In I. Guyon, M. Nikravesh, S. Gunn, & L. A. Zadeh (Eds.), Feature extraction (pp. 1–25). Berlin: Springer.

Haffari, G. R., & Sarkar, A. (2007). Analysis of semi-supervised learning with the Yarowsky algorithm. In Proceedings of the 23rd conference on uncertainty in artificial intelligence (pp. 159–166).

Hammersley, J. M., & Clifford, P. (1971). Markov fields on finite graphs and lattices. Retrieved October 27, 2019 from http://www.statslab.cam.ac.uk/~grg/books/hammfest/hamm-cliff.pdf.

He, R., Zheng, W. S., Hu, B. G., & Kong, X. W. (2011). Nonnegative sparse coding for discriminative semi-supervised learning. In Proceedings of the 2011 IEEE conference on computer vision and pattern recognition (pp. 2849–2856). IEEE.

Hein, M., & Maier, M. (2007). Manifold denoising. In Advances in neural information processing systems (pp. 561–568).

Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527–1554.

Huang, B., & Jebara, T. (2011). Fast b-matching via sufficient selection belief propagation. In Proceedings of the 14th international conference on artificial intelligence and statistics (pp. 361–369).

Jayadeva, K. R., & Chandra, S. (2007). Twin support vector machines for pattern classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(5), 905–910.

Jebara, T., Wang, J., & Chang, S. F. (2009) Graph construction and b-matching for semi-supervised learning. In Proceedings of the 26th annual international conference on machine learning (pp. 441–448).

Joachims, T. (1999). Transductive inference for text classification using support vector machines. In Proceedings of the 16th international conference on machine learning (Vol. 99, pp. 200–209).

Joachims, T. (2003). Transductive learning via spectral graph partitioning. In Proceedings of the 20th international conference on machine learning (pp. 290–297).

Karasuyama, M., & Mamitsuka, H. (2013) Manifold-based similarity adaptation for label propagation. In Advances in neural information processing systems (pp. 1547–1555).

Kingma, D. P., Mohamed, S., Rezende, D. J., & Welling, M. (2014). Semi-supervised learning with deep generative models. In Advances in neural information processing systems (pp. 3581–3589).

Kingma, D. P., & Welling, M. (2013). Auto-encoding variational Bayes. In International conference on learning, representations.

Kipf, T. N., & Welling, M. (2016). Semi-supervised classification with graph convolutional networks. arXiv:1609.02907.

Kiritchenko, S., & Matwin, S. (2001). Email classification with co-training. In Proceedings of the 2001 conference of the centre for advanced studies on collaborative research (P. 8). IBM press.

Kohonen, T. (1998). The self-organizing map. Neurocomputing, 21(1–3), 1–6.

Krizhevsky, A. (2009). Learning multiple layers of features from tiny images. Master’s thesis, University of Toronto, Department of Computer Science.

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105).

Kveton, B., Valko, M., Rahimi, A., & Huang, L. (2010). Semi-supervised learning with max-margin graph cuts. In Proceedings of the 13th international conference on artificial intelligence and statistics (pp. 421–428).

Laine, S., & Aila, T. (2017). Temporal ensembling for semi-supervised learning. In International conference on learning, representations.

Lange, T., Law, M. H., Jain, A. K., & Buhmann, J. M. (2005). Learning with constrained and unlabelled data. In Proceedings of the 2005 IEEE conference on computer vision and pattern recognition (Vol. 1, pp. 731–738). IEEE.

Lawrence, N. D., & Jordan, M. I. (2005). Semi-supervised learning via Gaussian processes. In Advances in neural information processing systems (pp. 753–760).

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436.

Lee, D. H. (2013). Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Proceedings of the 30th ICML workshop on challenges in representation learning (Vol. 3, p. 2).

Leistner, C., Saffari, A., Santner, J., Bischof, H. (2009). Semi-supervised random forests. In Proceedings of the IEEE 12th international conference on computer vision (pp. 506–513). IEEE.

Levatić, J., Ceci, M., Kocev, D., & Džeroski, S. (2017). Semi-supervised classification trees. Journal of Intelligent Information Systems, 49(3), 461–486.

Li, C., Xu, K., Zhu, J., & Zhang, B. (2017). Triple generative adversarial nets. arXiv:1703.02291.

Li, M., & Zhou, Z. H. (2007). Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Transactions on Systems, Man, and Cybernetics—Part A: Systems and Humans, 37(6), 1088–1098.

Li, S., & Fu, Y. (2013). Low-rank coding with b-matching constraint for semi-supervised classification. In Proceedings of the 23rd international joint conference on artificial intelligence (pp. 1472–1478).

Li, S., & Fu, Y. (2015). Learning balanced and unbalanced graphs via low-rank coding. IEEE Transactions on Knowledge and Data Engineering, 27(5), 1274–1287.

Li, Y. F., & Zhou, Z. H. (2015). Towards making unlabeled data never hurt. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(1), 175–188.

Liu, B., Lee, W. S., Yu, P. S., & Li, X. (2002). Partially supervised classification of text documents. In Proceedings of the 19th international conference on machine learning (Vol. 2, pp. 387–394).

Liu, G., Lin, Z., & Yu, Y. (2010a). Robust subspace segmentation by low-rank representation. In Proceedings of the 27th international conference on machine learning (pp. 663–670).

Liu, W., & Chang, S. F. (2009). Robust multi-class transductive learning with graphs. In Proceedings of the 2009 IEEE conference on computer vision and pattern recognition (pp. 381–388). IEEE.

Liu, W., He, J., & Chang, S. F. (2010b). Large graph construction for scalable semi-supervised learning. In Proceedings of the 27th international conference on machine learning (pp. 679–686).

Liu, X., Song, M., Tao, D., Liu, Z., Zhang, L., Chen, C., & Bu, J. (2013). Semi-supervised node splitting for random forest construction. In Proceedings of the 2013 IEEE conference on computer vision and pattern recognition (pp. 492–499). IEEE.

Liu, W., Wang, J., & Chang, S. F. (2012). Robust and scalable graph-based semisupervised learning. Proceedings of the IEEE, 100(9), 2624–2638.

Liu, X., Song, M., Tao, D., Liu, Z., Zhang, L., Chen, C., et al. (2015). Random forest construction with robust semisupervised node splitting. IEEE Transactions on Image Processing, 24(1), 471–483.

Lu, Q., Getoor, L. (2003). Link-based classification. In Proceedings of the 20th international conference on machine learning (pp. 496–503).

Luo, Y., Zhu, J., Li, M., Ren, Y., & Zhang, B. (2018). Smooth neighbors on teacher graphs for semi-supervised learning. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 8896–8905).

Maier, M., Luxburg, U. V., & Hein, M. (2009). Influence of graph construction on graph-based clustering measures. In Advances in neural information processing systems (pp. 1025–1032).

Mallapragada, P. K., Jin, R., Jain, A. K., & Liu, Y. (2009). Semiboost: Boosting for semi-supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(11), 2000–2014.

Melacci, S., & Belkin, M. (2011). Laplacian support vector machines trained in the primal. Journal of Machine Learning Research, 12, 1149–1184.

Mihalcea, R. (2004). Co-training and self-training for word sense disambiguation. In Proceedings of the 8th conference on computational natural language learning.

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013) Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems (pp. 3111–3119).

Miyato, T., Maeda, S. I., Koyama, M., & Ishii, S. (2018). Virtual adversarial training: A regularization method for supervised and semi-supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(8), 1979–1993.

Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., & Ng, A.Y. (2011). Reading digits in natural images with unsupervised feature learning. In NIPS workshop on deep learning and unsupervised feature learning.

Neville, J., & Jensen, D. (2000). Iterative classification in relational data. In Proceedings of the 17th AAAI workshop on learning statistical models from relational data (pp. 13–20).

Nigam, K., & Ghani, R. (2000). Analyzing the effectiveness and applicability of co-training. In Proceedings of the 9th international conference on information and knowledge management (pp. 86–93). ACM.

Nigam, K., McCallum, A., Mitchell, T. (2006). Semi-supervised text classification using EM. In Semi-Supervised Learning (pp. 33–56).

Nigam, K., McCallum, A. K., Thrun, S., & Mitchell, T. (2000). Text classification from labeled and unlabeled documents using EM. Machine Learning, 39(2), 103–134.

Niyogi, P. (2008). Manifold regularization and semi-supervised learning: Some theoretical analyses. Journal of Machine Learning Research, 14(1), 1229–1250.

Odena, A. (2016). Semi-supervised learning with generative adversarial networks. arXiv:1606.01583.

Oliver, A., Odena, A., Raffel, C., Cubuk, E. D., Goodfellow, I. J. (2018). Realistic evaluation of deep semi-supervised learning algorithms. arXiv:1804.09170.

Oshiro, T. M., Perez, P. S., & Baranauskas, J. A. (2012). How many trees in a random forest? In Proceedings of the international workshop on machine learning and data mining in pattern recognition (pp. 154–168). Springer.

Pang, B., & Lee, L. (2004). A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42nd annual meeting on association for computational linguistics, association for computational linguistics (p. 271).

Park, S., Park, J., Shin, S., & Moon, I. (2018). Adversarial dropout for supervised and semi-supervised learning. In Proceedings of the thirty-second AAAI conference on artificial intelligence (pp. 3917–3924).

Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., & Lerer, A. (2017). Automatic differentiation in pytorch. In NIPS Autodiff workshop.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12, 2825–2830.

Perozzi, B., Al-Rfou, R., & Skiena, S. (2014). Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 701–710). ACM.

Pezeshki, M., Fan, L., Brakel, P., Courville, A., & Bengio, Y. (2016). Deconstructing the ladder network architecture. In Proceedings of the 33rd international conference on machine learning (pp. 2368–2376).

Pitelis, N., Russell, C., & Agapito, L. (2013). Learning a manifold as an atlas. In Proceedings of the 2013 IEEE conference on computer vision and pattern recognition (pp. 1642–1649). IEEE.

Pitelis, N., Russell, C., & Agapito, L. (2014). Semi-supervised learning using an unsupervised atlas. In Proceedings of the joint European conference on machine learning and knowledge discovery in databases (pp. 565–580). Springer.

Prémont-Schwarz, I., Ilin, A., Hao, T., Rasmus, A., Boney, R., & Valpola, H. (2017). Recurrent ladder networks. In: I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Garnett (eds.), Advances in neural information processing systems (pp. 6009–6019).

Provost, F., & Domingos, P. (2003). Tree induction for probability-based ranking. Machine Learning, 52(3), 199–215.

Qi, Z., Tian, Y., & Shi, Y. (2012). Laplacian twin support vector machine for semi-supervised classification. Neural Networks, 35, 46–53.

Rasmus, A., Berglund, M., Honkala, M., Valpola, H., & Raiko, T. (2015). Semi-supervised learning with ladder networks. In Advances in neural information processing systems (pp. 3546–3554).

Ratle, F., Camps-Valls, G., & Weston, J. (2010). Semisupervised neural networks for efficient hyperspectral image classification. IEEE Transactions on Geoscience and Remote Sensing, 48(5), 2271–2282.

Rifai, S., Dauphin, Y. N., Vincent, P., Bengio, Y., & Muller, X. (2011a). The manifold tangent classifier. In Advances in neural information processing systems (pp. 2294–2302).

Rifai, S., Vincent, P., Muller, X., Glorot, X., & Bengio, Y. (2011b). Contractive auto-encoders: Explicit invariance during feature extraction. In Proceedings of the 28th international conference on machine learning (pp. 833–840).

Rosenberg, C., Hebert, M., & Schneiderman, H. (2005). Semi-supervised self-training of object detection models. In Proceedings of the 7th IEEE workshop on applications of computer vision (pp. 29–36).

Roweis, S. T., & Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500), 2323–2326.

Sajjadi, M., Javanmardi, M., & Tasdizen, T. (2016). Regularization with stochastic transformations and perturbations for deep semi-supervised learning. In Advances in neural information processing systems (pp. 1163–1171).

Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., & Chen, X. (2016). Improved techniques for training gans. In Advances in neural information processing systems (pp. 2234–2242).

Sen, P., Namata, G., Bilgic, M., Getoor, L., Galligher, B., & Eliassi-Rad, T. (2008). Collective classification in network data. AI Magazine, 29(3), 93.

Settles, B. (2012). Active learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 6(1), 1–114.

Sheikhpour, R., Sarram, M. A., Gharaghani, S., & Chahooki, M. A. Z. (2017). A survey on semi-supervised feature selection methods. Pattern Recognition, 64, 141–158.

Shental, N., & Domany, E. (2005). Semi-supervised learning—A statistical physics approach. In Proceedings of the 22nd ICML workshop on learning with partially classified training data.

Sindhwani, V., Niyogi, P., & Belkin, M. (2005). A co-regularization approach to semi-supervised learning with multiple views. In Proceedings of the 22nd ICML workshop on learning with multiple views (pp. 74–79).

Sindhwani, V., & Rosenberg, D. S. (2008). An RKHS for multi-view learning and manifold co-regularization. In Proceedings of the 25th international conference on machine learning (pp. 976–983).

Singh, A., Nowak, R., & Zhu, X. (2009) Unlabeled data: Now it helps, now it doesn’t. In Advances in neural information processing systems (pp. 1513–1520).

Solomon, J., Rustamov, R., Guibas, L., & Butscher, A. (2014) Wasserstein propagation for semi-supervised learning. In Proceedings of the 31st international conference on machine learning (pp. 306–314).

Springenberg, J. T. (2015). Unsupervised and semi-supervised learning with categorical generative adversarial networks. arXiv:1511.06390.

Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1), 1929–1958.

Subramanya, A., & Bilmes, J. (2008). Soft-supervised learning for text classification. In Proceedings of the conference on empirical methods in natural language processing, association for computational linguistics (pp. 1090–1099).

Subramanya, A., & Bilmes, J. (2011). Semi-supervised learning with measure propagation. Journal of Machine Learning Research, 12, 3311–3370.

Subramanya, A., & Talukdar, P. P. (2014). Graph-based semi-supervised learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 8(4), 1–125.

Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., et al. (2013). Intriguing properties of neural networks. arXiv:1312.6199.

Szummer, M., & Jaakkola, T. (2002) Partially labeled classification with Markov random walks. In Advances in neural information processing systems (pp. 945–952).

Szummer, M., & Jaakkola, T. S. (2003) Information regularization with partially labeled data. In Advances in neural information processing systems (pp. 1049–1056).

Talukdar, P. P., & Crammer, K. (2009). New regularized algorithms for transductive learning. In Proceedings of the joint European conference on machine learning and knowledge discovery in databases (pp. 442–457). Springer.

Talukdar, P. P., Reisinger, J., Paşca, M., Ravichandran, D., Bhagat, R., & Pereira, F. (2008). Weakly-supervised acquisition of labeled class instances using graph random walks. In Proceedings of the conference on empirical methods in natural language processing, association for computational linguistics (pp. 582–590).

Tan, C., Lee, L., Tang, J., Jiang, L., Zhou, M., & Li, P. (2011). User-level sentiment analysis incorporating social networks. In Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1397–1405). ACM.

Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., & Mei, Q. (2015). Line: Large-scale information network embedding. In Proceedings of the 24th international conference on world wide web, international world wide web conferences steering committee (pp. 1067–1077).

Tanha, J., van Someren, M., & Afsarmanesh, H. (2012). An adaboost algorithm for multiclass semi-supervised learning. In Proceedings of the 12th IEEE international conference on data mining (pp. 1116–1121). IEEE.

Tanha, J., van Someren, M., & Afsarmanesh, H. (2017). Semi-supervised self-training for decision tree classifiers. International Journal of Machine Learning and Cybernetics, 8(1), 355–370.

Tarvainen, A., & Valpola, H. (2017) Weight-averaged consistency targets improve semi-supervised deep learning results. In Advances in neural information processing systems (pp. 1195–1204).

Thornton, C., Hutter, F., Hoos, H. H., & Leyton-Brown, K. (2013) Auto-weka: Combined selection and hyperparameter optimization of classification algorithms. In Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 847–855). ACM.

Triguero, I., García, S., & Herrera, F. (2015). Self-labeled techniques for semi-supervised learning: Taxonomy, software and empirical study. Knowledge and Information Systems, 42(2), 245–284.

Triguero, I., González, S., Moyano, J. M., García López, S., Alcalá Fernández, J., Luengo Martín, J., et al. (2017). KEEL 3.0: An open source software for multi-stage analysis in data mining. International Journal of Computational Intelligence Systems, 10, 1238–1249.

Urner, R., Ben-David, S., & Shalev-Shwartz, S. (2011). Access to unlabeled data can speed up prediction time. In Proceedings of the 27th international conference on machine learning (pp. 641–648).

Valizadegan, H., Jin, R., & Jain, A. K. (2008). Semi-supervised boosting for multi-class classification. In Joint European conference on machine learning and knowledge discovery in databases (pp. 522–537). Springer.

Vapnik, V. (1998). Statistical learning theory (Vol. 1). New York: Wiley.

Verma, V., Lamb, A., Kannala, J., Bengio, Y., & Lopez-Paz, D. (2019). Interpolation consistency training for semi-supervised learning. arXiv:1903.03825.

Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P. A. (2008). Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th international conference on machine learning (pp. 1096–1103).

Wager, S., Wang, S., & Liang, P. S. (2013). Dropout training as adaptive regularization. In Advances in neural information processing systems (pp. 351–359).

Wan, X. (2009). Co-training for cross-lingual sentiment classification. In Proceedings of the 47th annual meeting of the ACL, association for computational linguistics (pp. 235–243).

Wang, D., Cui, P., Zhu, W. (2016). Structural deep network embedding. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1225–1234). ACM.

Wang, F., & Zhang, C. (2008). Label propagation through linear neighborhoods. IEEE Transactions on Knowledge and Data Engineering, 20(1), 55–67.

Wang, J., Jebara, T., & Chang, S. F. (2008a). Graph transduction via alternating minimization. In Proceedings of the 25th international conference on machine learning (pp. 1144–1151).

Wang, J., Jebara, T., & Chang, S. F. (2013). Semi-supervised learning using greedy max-cut. Journal of Machine Learning Research, 14, 771–800.

Wang, J., Luo, S. W., & Zeng. X. H. (2008b). A random subspace method for co-training. In Proceedings of the IEEE international joint conference on neural networks (pp. 195–200). IEEE.

Wang, W., & Zhou, Z. H. (2007). Analyzing co-training style algorithms. In Proceedings of the 18th European conference on machine learning (pp. 454–465). Springer.

Wang, W., Zhou, Z. H. (2010). A new analysis of co-training. In Proceedings of the 27th international conference on machine learning (pp. 1135–1142).

Weston, J., Ratle, F., & Collobert, R. (2008). Deep learning via semi-supervised embedding. In Proceedings of the 25th international conference on machine learning (pp. 1168–1175).

Wold, S., Esbensen, K., & Geladi, P. (1987). Principal component analysis. Chemometrics and Intelligent Laboratory Systems, 2(1–3), 37–52.

Wright, J., Yang, A. Y., Ganesh, A., Sastry, S. S., & Ma, Y. (2009). Robust face recognition via sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(2), 210–227.

Wu, X. M., Li, Z., So, A. M., Wright, J., & Chang, S. F. (2012a). Learning with partially absorbing random walks. In Advances in neural information processing systems (pp. 3077–3085).

Wu, Z., Wu, J., Cao, J., & Tao, D. (2012b). Hysad: A semi-supervised hybrid shilling attack detector for trustworthy product recommendation. In Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 985–993). ACM.

Xu, C., Tao, D., & Xu, C. (2013). A survey on multi-view learning. arXiv:1304.5634.

Xu, J., He, H., & Man, H. (2012). Dcpe co-training for classification. Neurocomputing, 86, 75–85.

Xu, L., & Schuurmans, D. (2005) Unsupervised and semi-supervised multi-class support vector machines. In Proceedings of the 20th national conference on artificial intelligence (Vol. 5, p. 13).

Yan, S., & Wang, H. (2009). Semi-supervised learning by sparse representation. In Proceedings of the 2009 SIAM international conference on data mining (pp. 792–801). SIAM.

Yang, Z., Cohen, W. W., & Salakhutdinov, R. (2016) Revisiting semi-supervised learning with graph embeddings. In Proceedings of the 33rd international conference on machine learning (pp. 40–48).

Yarowsky, D. (1995). Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd annual meeting of the association for computational linguistics, association for computational linguistics (pp. 189–196).

Yaslan, Y., & Cataltepe, Z. (2010). Co-training with relevant random subspaces. Neurocomputing, 73(10), 1652–1661.

Yu, S., Krishnapuram, B., Rosales, R., & Rao, R. B. (2011). Bayesian co-training. Journal of Machine Learning Research, 12, 2649–2680.

Zhang, H., Cisse, M., Dauphin, Y. N., & Lopez-Paz, D. (2018). mixup: Beyond empirical risk minimization. In International conference on learning representations.

Zhang, K., Kwok, J. T., & Parvin, B. (2009). Prototype vector machine for large scale semi-supervised learning. In Proceedings of the 26th international conference on machine learning (pp. 1233–1240).

Zhang, W., & Zheng, Q. (2009). Tsfs: A novel algorithm for single view co-training. In Proceedings of the 2nd IEEE international joint conference on computational sciences and optimization (Vol. 1, pp. 492–496). IEEE.

Zhou, D., Bousquet, O., Lal, T. N., Weston, J., & Schölkopf, B. (2004). Learning with local and global consistency. In Advances in Neural Information Processing Systems (pp. 321–328).

Zhou, Y., & Goldman, S. (2004) Democratic co-learning. In Proceedings of the 16th IEEE international conference on tools with artificial intelligence (pp. 594–602). IEEE.

Zhou, Z. H. (2012). Ensemble methods: Foundations and algorithms. Boca Raton: CRC Press.

Zhou, Z. H., & Li, M. (2005a) Semi-supervised regression with co-training. In Proceedings of the 19th international joint conference on artificial intelligence (Vol. 5, pp. 908–913).

Zhou, Z. H., & Li, M. (2005b). Tri-training: Exploiting unlabeled data using three classifiers. IEEE Transactions on Knowledge and Data Engineering, 17(11), 1529–1541.

Zhou, Z. H., & Li, M. (2010). Semi-supervised learning by disagreement. Knowledge and Information Systems, 24(3), 415–439.

Zhu, X. (2005). Semi-supervised learning with graphs. Ph.D. thesis, Carnegie Mellon University.

Zhu, X. (2008). Semi-supervised learning literature survey. Technical Report. 1530, University of Wisconsin-Madison.

Zhu, X., & Ghahramani, Z. (2002a). Learning from labeled and unlabeled data with label propagation. Technical Report. CMU-CALD-02-107, Carnegie Mellon University.

Zhu, X., & Ghahramani, Z. (2002b) Towards semi-supervised classification with Markov random fields. Technival Report. CMU-CALD-02-106, Carnegie Mellon University.

Zhu, X., Ghahramani, Z., & Lafferty, J. D. (2003) Semi-supervised learning using Gaussian fields and harmonic functions. In Proceedings of the 20th international conference on machine learning (pp. 912–919).

Zhu, X., & Goldberg, A. B. (2009). Introduction to semi-supervised learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 3(1), 1–130.

Zhu, X., & Lafferty, J. (2005). Harmonic mixtures: Combining mixture models and graph-based methods for inductive and scalable semi-supervised learning. In Proceedings of the 22nd international conference on machine learning (pp. 1052–1059). ACM.

Zhuang, L., Gao, H., Lin, Z., Ma, Y., Zhang, X., & Yu, N. (2012) Non-negative low rank and sparse graph for semi-supervised learning. In Proceedings of the 2012 IEEE conference on computer vision and pattern recognition (pp. 2328–2335). IEEE.