Extreme value theory for anomaly detection – the GPD classifier
Tóm tắt
Classification tasks usually assume that all possible classes are present during the training phase. This is restrictive if the algorithm is used over a long time and possibly encounters samples from unknown new classes. It is therefore fundamental to develop algorithms able to distinguish between normal and abnormal test data. In the last few years, extreme value theory has become an important tool in multivariate statistics and machine learning. The recently introduced extreme value machine, a classifier motivated by extreme value theory, addresses this problem and achieves competitive performance in specific cases. We show that this algorithm has some theoretical and practical drawbacks and can fail even if the recognition task is fairly simple. To overcome these limitations, we propose two new algorithms for anomaly detection relying on approximations from extreme value theory that are more robust in such cases. We exploit the intuition that test points that are extremely far from the training classes are more likely to be abnormal objects. We derive asymptotic results motivated by univariate extreme value theory that make this intuition precise. We show the effectiveness of our classifiers in simulations and on real data sets.
Tài liệu tham khảo
Abe, N., Zadrozny, B., Langford, J.: Outlier detection by active learning. In: International Conference on Knowledge Discovery and Data Mining. ACM (2006)
Arya, S., Mount, D.M., Netanyahu, N.S., Silverman, R., Wu, A.Y.: An optimal algorithm for approximate nearest neighbor searching fixed dimensions. Journal of the ACM 45(6) (1998)
Bendale, A., Boult, T.: Towards open world recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (2015)
Bishop, C.M.: Novelty detection and neural network validation. IEE Proceedings-Vision, Image and Signal Processing 141(4) (1994)
Bradley, A.P. : The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30(7) (1997)
Cai, J. , Einmahl, J., De Haan, L., et al.: Estimation of extreme risk regions under multivariate regular variation. The Annals of Statistics 39(3) (2011)
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Computing Surveys (CSUR) 41(3) (2009)
Christopher, M.B.: Pattern Recognition and Machine Learning. Springer, New York (2016)
Coles, S., Bawa, J., Trenner, L., Dorazio, P.: An Introduction to Statistical Modeling of Extreme Values. Springer, Berlin (2001)
De Haan, L., Ferreira, A.: Extreme Value Theory: an Introduction. Springer Science & Business Media, Berlin (2007)
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Conference on Computer Vision and Pattern Recognition (2009)
Désir, C., Bernard, S., Petitjean, C., Heutte, L.: One class random forests. Pattern Recognition 46(12) (2013)
Dua, D., Graff, C.: UCI machine learning repository. http://archive.ics.uci.edu/ml (2017)
Einmahl, J., Li, J., Liu, R., et al.: Bridging centrality and extremity: refining empirical data depth using extreme value statistics. The Annals of Statistics 43(6) (2015)
Embrechts, P., Klüppelberg, C., Mikosch, T.: Modelling Extremal Events: for Insurance and Finance, vol. 33. Springer Science & Business Media, Berlin (2013)
Fragoso, V., Sen, P., Rodriguez, S., Turk, M.: EVSAC: accelerating hypotheses generation by modeling matching scores with extreme value theory. In: IEEE International Conference on Computer Vision (2013)
Frey, P.W., Slate, D.J.: Letter recognition using holland-style adaptive classifiers. Machine Learning 6(2) (1991)
Geng, C., Huang, S., Chen, S.: Recent advances in open set recognition: a survey. Preprint arXiv:1811.08581 (2018)
Goix, N., Sabourin, A., Clemencon, S.: Sparse representation of multivariate extremes with applications to anomaly ranking. In: AISTATS (2016)
Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing (2013)
Hall, P.: On estimating the endpoint of a distribution. The Annals of Statistics 10(2) (1982)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)
He, Y., Einmahl, J.: Estimation of extreme depth-based quantile regions. J. R. Stat. Soc. Ser. B. Stat. Methodol. 79 (2017)
Hill, B.M.: A simple general approach to inference about the tail of a distribution. The Annals of Statistics, pp 1163–1174 (1975)
Huang, J., Ling, C.X.: Using AUC and accuracy in evaluating learning algorithms. IEEE Transactions on Knowledge and Data Engineering 17(3) (2005)
Jalalzai, H., Clémençon, S., Sabourin, A.: On binary classification in extreme regions. In: Advances in Neural Information Processing Systems (2018)
James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning. Springer, Berlin (2013)
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Technical report, University of Toronto (2009)
Liu, F.T., Ting, K.M., Zhou, Z.: Isolation-based anomaly detection. ACM Transactions on Knowledge Discovery from Data (TKDD) 6(1) (2012)
Mensink, T., Verbeek, J., Perronnin, F., Csurka, G.: Metric learning for large scale image classification: generalizing to new classes at near-zero cost. In: European Conference on Computer Vision. Springer, Berlin (2012)
Nguyen, A., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. In: IEEE Conference on Computer Vision and Pattern Recognition (2015)
Pimentel, M.A.F., Clifton, D.A., Clifton, L., Tarassenko, L.: A review of novelty detection. Signal Process. 99 (2014)
Quinlan, J.R., Compton, P.J., Horn, K.A., Lazarus, L.: Inductive knowledge acquisition: a case study. In: Proceedings of the second Australian Conference on the Applications of Expert Systems (1986)
Rebuffi, S., Kolesnikov, A., Lampert, C.H.: icaRL: incremental classifier and representation learning. In: Conference on Computer Vision and Pattern Recognition (2017)
Roberts, S.J.: Novelty detection using extreme value statistics. IEE Proceedings-Vision, Image and Signal Processing 146(3) (1999)
Rudd, E.M., Jain, L.P., Scheirer, W.J., Boult, T.E. : The extreme value machine. IEEE Transactions on Pattern Analysis and Machine Intelligence 40(3) (2018)
Ruping, S.: Incremental learning with support vector machines. In: IEEE International Conference on Data Mining (2001)
Saffari, A., Leistner, C., Santner, J., Godec, M., Bischof, H.: On-line random forests. In: IEEE International Conference on Computer Vision Workshops (2009)
Scheirer, W.J.: Extreme value theory-based methods for visual recognition. Synthesis Lectures on Computer Vision 7(1) (2017)
Scheirer, W.J., Rocha, A., Micheals, R.J., Boult, T.E.: Meta-recognition: the theory and practice of recognition score analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 33(8) (2011)
Schiffmann, W., Joost, M., Werner, R.: Synthesis and performance analysis of multilayer neural network architectures. Technical report, University of Koblenz (1992)
Schölkopf, B., Williamson, R.C., Smola, A.J., Shawe-Taylor, J., Platt, J.C.: Support vector method for novelty detection. In: Advances in Neural Information Processing Systems (2000)
Shaffer, J.P.: Multiple hypothesis testing. Annual Review of Psychology 46(1) (1995)
Shon, T., Moon, J.: A hybrid machine learning approach to network anomaly detection. Information Sciences 177(18) (2007)
Siffer, A., Fouque, P., Termier, A., Largouet, C.: Anomaly detection in streams with extreme value theory. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2017)
Thomas, A., Clemencon, S., Gramfort, A., Sabourin, A.: Anomaly detection in extreme regions via empirical MV-sets on the sphere. In: AISTATS (2017)
Walfish, S.: A review of statistical outlier methods. Pharmaceutical Technology 30(11) (2006)
Weissman, I.: Estimation of parameters and large quantiles based on the k largest observations. J. Amer. Statist. Assoc. 73 (1978)