Fast greedy $$\mathcal {C}$$ -bound minimization with guarantees

Machine Learning - Tập 109 - Trang 1945-1986 - 2020

Baptiste Bauvin^1,2, Cécile Capponi², Jean-Francis Roy³, François Laviolette^2,4

¹Aix Marseille University, Toulon University, CNRS, LIS (Qarma), Marseille, France

²Dép. d’informatique et de génie logiciel, Université Laval, Québec, Canada

³Coveo Solutions Inc., Québec, Canada

⁴Element AI, Montréal, Canada

Tóm tắt

The $$\mathcal {C}$$ -bound is a tight bound on the true risk of a majority vote classifier that relies on the individual quality and pairwise disagreement of the voters and provides PAC-Bayesian generalization guarantees. Based on this bound, MinCq is a classification algorithm that returns a dense distribution on a finite set of voters by minimizing it. Introduced later and inspired by boosting, CqBoost uses a column generation approach to build a sparse $$\mathcal {C}$$ -bound optimal distribution on a possibly infinite set of voters. However, both approaches have a high computational learning time because they minimize the $$\mathcal {C}$$ -bound by solving a quadratic program. Yet, one advantage of CqBoost is its experimental ability to provide sparse solutions. In this work, we address the problem of accelerating the $$\mathcal {C}$$ -bound minimization process while keeping the sparsity of the solution and without losing accuracy. We present CB-Boost, a computationally efficient classification algorithm relying on a greedy–boosting-based– $$\mathcal {C}$$ -bound optimization. An in-depth analysis proves the optimality of the greedy minimization process and quantifies the decrease of the $$\mathcal {C}$$ -bound operated by the algorithm. Generalization guarantees are then drawn based on already existing PAC-Bayesian theorems. In addition, we experimentally evaluate the relevance of CB-Boost in terms of the three main properties we expect about it: accuracy, sparsity, and computational efficiency compared to MinCq, CqBoost, Adaboost and other ensemble methods. As observed in these experiments, CB-Boost not only achieves results comparable to the state of the art, but also provides $$\mathcal {C}$$ -bound sub-optimal weights with very few computational demand while keeping the sparsity property of CqBoost.

Tài liệu tham khảo

Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. Catoni, O. (2007). PAC-Bayesian supervised classification: the thermodynamics of statistical learning. arXiv preprint arXiv:0712.0248. Cortes, C., Mohri, M., & Syed, U. (2014). Deep boosting. In: Proceedings of the thirty-first international conference on machine learning (ICML) (2014). Demiriz, A., Bennett, K. P., & Shawe-Taylor, J. (2002). Linear programming boosting via column generation. Machine Learning, 46(1), 225–254. Dua, D., & Graff, C. (2017). UCI machine learning repository. Dziugaite, G.K., & Roy, D.M. (2018). Data-dependent PAC-Bayes priors via differential privacy. In: Advances in neural information processing systems, pp. 8440–8450. Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139. Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5), 1189–1232. Germain, P., Lacasse, A., Laviolette, F., & Marchand, M. (2009). PAC-Bayesian learning of linear classifiers. In: Proceedings of the 26th ICML, pp. 353–360. ACM. Germain, P., Lacasse, A., Laviolette, F., Marchand, M., & Roy, J. F. (2015). Risk bounds for the majority vote: From a PAC-Bayesian analysis to a learning algorithm. Journal of Machine Learning Research, 16(1), 787–860. Lacasse, A., Laviolette, F., Marchand, M., Germain, P., & Usunier, N. (2006). PAC-Bayes bounds for the risk of the majority vote and the variance of the Gibbs classifier. In: B. Schölkopf, J.C. Platt, T. Hoffman (Eds.) Advances in neural information processing systems 19, pp. 769–776. MIT Press. Lampert, C.H., Nickisch, H., & Harmeling, S. (2009). Learning to detect unseen object classes by between-class attribute transfer. In: 2009 IEEE Conference on computer vision and pattern recognition, pp. 951–958. IEEE. Lanckriet, G. R. G., Cristianini, N., Bartlett, P., Ghaoui, L. E., & Jordan, M. I. (2004). Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research, 5, 27–72. Langford, J., & Shawe-Taylor, J. (2003). PAC-Bayes & margins. In: Advances in neural information processing systems, pp. 439–446. LeCun, Y., & Cortes, C. (2010). MNIST handwritten digit database. Marchand, M., & Taylor, J. S. (2003). The set covering machine. Journal of Machine Learning Research, 3, 723–746. McAllester, D. A. (1999). Some PAC-Bayesian theorems. Machine Learning, 37(3), 355–363. McAllester, D. A. (2003). PAC-Bayesian stochastic model selection. Machine Learning, 51(1), 5–21. Parrado-Hernández, E., Ambroladze, A., Shawe-Taylor, J., & Sun, S. (2012). PAC-Bayes bounds with data dependent priors. Journal of Machine Learning Research, 13, 3507–3531. Roy, J.F., Marchand, M., & Laviolette, F. (2016) A column generation bound minimization approach with PAC-Bayesian generalization guarantees. In: A. Gretton, C.C. Robert (eds.) Proceedings of the 19th international conference on artificial intelligence and statistics, proceedings of machine learning research, vol. 51, pp. 1241–1249. PMLR, Cadiz, Spain. http://proceedings.mlr.press/v51/roy16.html. Schapire, R. E., & Freund, Y. (2012). Boosting: foundations and algorithms. Cambridge: The MIT Press. Seeger, M. (2002). PAC-Bayesian generalisation error bounds for gaussian process classification. Journal of Machine Learning Research, 3, 233–269. Seldin, Y., Cesa-Bianchi, N., Auer, P., Laviolette, F., & Shawe-Taylor, J. (2012). PAC-Bayes-bernstein inequality for martingales and its application to multiarmed bandits. Proceedings of the Workshop on On-line Trading of Exploration and Exploitation, 2, 98–111. Sonnenburg, S., Rätsch, G., Schäfer, C., & Schölkopf, B. (2006). Large scale multiple kernel learning. Journal of Machine Learning Research, 7, 1531–1565. Valiant, L. G. (1984). A theory of the learnable. Communications of the ACM, 27(11), 1134–1142. Zhu, J., Rosset, S., Zou, H., & Hastie, T. (2006). Multi-class adaboost. Statistics and its Interface,. https://doi.org/10.4310/SII.2009.v2.n3.a8.

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA