Optimised one-class classification performance

Oliver Urs Lenz1, Daniel Peralta2, Chris Cornelis1
1Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
2IDLab, Department of Information Technology, Ghent University – IMEC, Ghent, Belgium

Tóm tắt

We provide a thorough treatment of one-class classification with hyperparameter optimisation for five data descriptors: Support Vector Machine (SVM), Nearest Neighbour Distance (NND), Localised Nearest Neighbour Distance (LNND), Local Outlier Factor (LOF) and Average Localised Proximity (ALP). The hyperparameters of SVM and LOF have to be optimised through cross-validation, while NND, LNND and ALP allow an efficient form of leave-one-out validation and the reuse of a single nearest-neighbour query. We experimentally evaluate the effect of hyperparameter optimisation with 246 classification problems drawn from 50 datasets. From a selection of optimisation algorithms, the recent Malherbe–Powell proposal optimises the hyperparameters of all data descriptors most efficiently. We calculate the increase in test AUROC and the amount of overfitting as a function of the number of hyperparameter evaluations. After 50 evaluations, ALP and SVM significantly outperform LOF, NND and LNND, and LOF and NND outperform LNND. The performance of ALP and SVM is comparable, but ALP can be optimised more efficiently so constitutes a good default choice. Alternatively, using validation AUROC as a selection criterion between ALP or SVM gives the best overall result, and NND is the least computationally demanding option. We thus end up with a clear trade-off between three choices, allowing practitioners to make an informed decision.

Từ khóa


Tài liệu tham khảo

Agarwal, S., & Sureka, A. (2015). Using KNN and SVM based one-class classifier for detecting online radicalization on Twitter. ICDCIT 2015: Proceedings of the 11th international conference on distributed computing and internet technology (pp. 431–442). Springer.

Antal, M., & Szabó, L. Z. (2015). An evaluation of one-class and two-class classification algorithms for keystroke dynamics authentication on mobile devices. CSCS 2015: Proceedings of the 20th international conference on control systems and computer science (pp. 343–350). IEEE.

Ban, T., & Abe, S. (2006). Implementing multi-class classifiers by one-class classification methods. IJCNN 2006: Proceedings of the IEEE international joint conference on neural networks (pp. 327–332). IEEE.

Bekker, J., & Davis, J. (2020). Learning from positive and unlabeled data: A survey. Machine Learning, 109(4), 719–760.

Benavoli, A., Corani, G., & Mangili, F. (2016). Should we really use post-hoc tests based on mean-ranks? Journal of Machine Learning Research, 17(5), 152–161.

Bergstra, J., Bardenet, R., Bengio, Y., Kégl, B. (2011). Algorithms for hyper-parameter optimization. In: NIPS 2011: Proceedings of the 25th annual conference on neural information processing systems, NIPS, advances in neural information processing systems (vol. 24, pp. 2546–2554)

Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13(10), 281–305.

Betrò, B. (1991). Bayesian methods in global optimization. Journal of Global Optimization, 1(1), 1–14.

Blank, J., & Deb, K. (2020). Pymoo: Multi-objective optimization in Python. IEEE Access, 8, 89497–89509.

Breunig, M. M., Kriegel, H. P., Ng, R. T., & Sander, J. (2000). LOF: Identifying density-based local outliers. SIGMOD 2000: Proceedings of the ACM international conference on Management of data (pp. 93–104). ACM.

Brochu, E., Cora, V.M., de Freitas, N. (2009). A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. Tech. Rep. UBC TR-2009-023, University of British Columbia, Department of Computer Science. Retrieved from http://arxiv.org/abs/1012.2599

Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.

de Ridder, D., Tax, D. M. J., & Duin, R. P. W. (1998). An experimental comparison of one-class classification methods. ASCI‘98: Proceedings of the fourth annual conference of the advanced school for computing and imaging (pp. 213–218). ASCI.

Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7(1), 1–30.

Dua, D., Graff, C. (2019). UCI machine learning repository. Retrieved from http://archive.ics.uci.edu/ml

Hadjadji, B., & Chibani, Y. (2018). Two combination stages of clustered one-class classifiers for writer identification from text fragments. Pattern Recognition, 82, 147–162.

Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6(2), 65–70.

Hooke, R., & Jeeves, T. A. (1961). “Direct search” solution of numerical and statistical problems. Journal of the ACM, 8(2), 212–229.

Janssens, J. H. M., Flesch, I., & Postma, E. O. (2009). Outlier detection with one-class classifiers from ML and KDD. ICMLA 2009: Proceedings of the Eighth International Conference on Machine Learning and Applications (pp. 147–153). IEEE.

Jones, D. R. (2001). A taxonomy of global optimization methods based on response surfaces. Journal of Global Optimization, 21(4), 345–383.

King, D. E. (2009). Dlib-ml: A machine learning toolkit. Journal of Machine Learning Research, 10(60), 1755–1758.

King, D.E. (2017). A global optimization algorithm worth using. Retrieved January 6, 2021 from http://blog.dlib.net/2017/12/a-global-optimization-algorithm-worth.html

Knorr, E. M., & Ng, R. T. (1997). A unified notion of outliers: Properties and computation. KDD-97: Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (pp. 219–222). AAAI.

Kushner, H. J. (1962). A versatile stochastic model of a function of unknown and time varying form. Journal of Mathematical Analysis and Applications, 5(1), 150–167.

Kushner, H. J. (1964). A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise. Journal of Basic Engineering, 86(1), 97–106.

Lenz, O. U., Peralta, D., & Cornelis, C. (2020). fuzzy-rough-learn 0.1: A Python library for machine learning with fuzzy rough sets. IJCRS 2020: Proceedings of the International Joint Conference on Rough Sets (pp. 491–499). Springer.

Lenz, O. U., Peralta, D., & Cornelis, C. (2021). Average Localised Proximity: A new data descriptor with good default one-class classification performance. Pattern Recognition, 118, 107991.

Malherbe, C., Vayatis, N. (2017). Global optimization of Lipschitz functions. In: ICML 2017: Proceedings of the 34th international conference on machine learning, proceedings of machine learning research (vol. 70, pp. 2314–2323)

Nelder, J. A., & Mead, R. (1965). A simplex method for function minimization. The Computer Journal, 7(4), 308–313.

Paleyes, A., Pullin, M., Mahsereci, M., Lawrence, N., & González, J. (2019). Emulation of physical processes with Emukit. NeurIPS 2019: Workshop on Machine Learning and the Physical Sciences. NeurIPS.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12(85), 2825–2830.

Powell, M.J.D. (2004). The NEWUOA software for unconstrained optimization without derivatives. Tech. Rep. NA2004/08, University of Cambridge, Department of Applied Mathematics and Theoretical Physics. Retrieved from http://www.damtp.cam.ac.uk/user/na/NA_papers/NA2004_08.pdf

Powell, M.J.D. (2009). The BOBYQA algorithm for bound constrained optimization without derivatives. Tech. Rep. NA2009/06, University of Cambridge, Department of Applied Mathematics and Theoretical Physics. Retrieved from http://www.damtp.cam.ac.uk/user/na/NA_papers/NA2009_06.pdf

Ribeiro, R. P., Pereira, P., & Gama, J. (2016). Sequential anomalies: A study in the railway industry. Machine Learning, 105(1), 127–153.

Rodríguez-Ruiz, J., Mata-Sánchez, J. I., Monroy, R., Loyola-Gonzalez, O., & López-Cuevas, A. (2020). A one-class classification approach for bot detection on Twitter. Computers & Security, 91, 101715.

Rosner, B., Glynn, R. J., & Lee, M. L. T. (2006). The Wilcoxon signed rank test for paired comparisons of clustered data. Biometrics, 62(1), 185–192.

Rousseeuw, P. J., & Croux, C. (1993). Alternatives to the median absolute deviation. Journal of the American Statistical Association, 88(424), 1273–1283.

Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C. (1999). Estimating the support of a high-dimensional distribution. Tech. Rep. MSR-TR-99-87, Microsoft Research, Redmond, Washington

Schölkopf, B., Platt, J. C., Shawe-Taylor, J., Smola, A. J., & Williamson, R. C. (2001). Estimating the support of a high-dimensional distribution. Neural Computation, 13(7), 1443–1471.

Sokolov, A., Paull, E. O., & Stuart, J. M. (2016). One-class detection of cell states in tumor subtypes. PSB 2016: Proceedings of the 21st Pacific Symposium on Biocomputing (pp. 405–416). World Scientific.

Spendley, W., Hext, G. R., & Himsworth, F. R. (1962). Sequential application of simplex designs in optimisation and evolutionary operation. Technometrics, 4(4), 441–461.

Stephenson, W., Frangella, Z., Udell, M., Broderick, T. (2021). Can we globally optimize cross-validation loss? Quasiconvexity in ridge regression. In: NeurIPS 2021: Proceedings of the thirty-fifth conference on neural information processing systems, NeurIPS, advances in neural information processing systems (vol. 34)

Swersky, L., Marques, H. O., Sander, J., Campello, R. J. G. B., & Zimek, A. (2016). On the evaluation of outlier detection and one-class classification methods. DSAA 2016: Proceedings of the 3rd IEEE International Conference on Data Science and Advanced Analytics (pp. 1–10). IEEE.

Tax, D.M.J. (2001). One-class classification: Concept learning in the absence of counter-examples. PhD thesis, Technische Universiteit Delft

Tax, D. M. J., & Duin, R. P. W. (1998). Outlier detection using classifier instability. SSPR/SPR 1998: Proceedings of the Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition and Structural and Syntactic Pattern Recognition (pp. 593–601). Springer.

Tax, D. M. J., & Duin, R. P. W. (1999). Data domain description using support vectors. ESANN 1999: Proceedings of the Seventh European Symposium on Artificial Neural Networks, D-Facto (pp. 251–256). ESANN.

Tax, D. M. J., & Duin, R. P. W. (1999). Support vector domain description. Pattern Recognition Letters, 20(11–13), 1191–1199.

Tax, D. M. J., & Duin, R. P. W. (2004). Support vector data description. Machine Learning, 54(1), 45–66.

Torczon, V.J. (1989). Multidirectional search: a direct search algorithm for parallel machines. PhD thesis, Rice University

Vigna, S. (2015). A weighted correlation index for rankings with ties. In: WWW ‘15: Proceedings of the 24th international conference on World Wide Web (pp. 1166–1176)

Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., van der Walt, S. J., Brett, M., Wilson, J., Millman, K. J., Mayorov, N., Nelson, A. R. J., Jones, E., Kern, R., Larson. E., … SciPy 1.0 Contributors. (2020). SciPy 1.0: Fundamental algorithms for scientific computing in Python. Nature Methods, 17(3), 261–272.

Wright, M.H. (1995). Direct search methods: Once scorned, now respectable. In: Numerical analysis 1995: Proceedings of the 16th Dundee Biennial conference on numerical analysis, Longman, Pitman research notes in mathematics series (vol. 344, pp. 191–208)

Yager, R. R. (1988). On ordered weighted averaging aggregation operators in multicriteria decisionmaking. IEEE Transactions on Systems, Man, and Cybernetics, 18(1), 183–190.