The Big Data razor

European Journal for Philosophy of Science - Tập 10 - Trang 1-20 - 2020
Ezequiel López-Rubio1,2
1Departamento de Lenguajes y Ciencias de la Computación, Universidad de Málaga (UMA), Málaga, Spain
2Departamento de Lógica, Historia y Filosofía de la Ciencia, Universidad Nacional de Educación a Distancia (UNED), Madrid, Spain

Tóm tắt

Classic conceptions of model simplicity for machine learning are mainly based on the analysis of the structure of the model. Bayesian, Frequentist, information theoretic and expressive power concepts are the best known of them, which are reviewed in this work, along with their underlying assumptions and weaknesses. These approaches were developed before the advent of the Big Data deluge, which has overturned the importance of structural simplicity. The computational simplicity concept is presented, and it is argued that it is more encompassing and closer to actual machine learning practices than the classic ones. In order to process the huge datasets which are commonplace nowadays, the computational complexity of the learning algorithm is the decisive factor to assess the viability of a machine learning strategy, while the classic accounts of simplicity play a surrogate role. Some of the desirable features of computational simplicity derive from its reliance on the learning system concept, which integrates key aspects of machine learning that are ignored by the classic concepts. Moreover, computational simplicity is directly associated with energy efficiency. In particular, the question of whether the maximum possibly achievable predictive accuracy should be attained, no matter the economic cost of the associated energy consumption pattern, is considered.

Tài liệu tham khảo

Acid, S., de Campos, L.M., Fernández-Luna, J.M., Rodríguez, S., Rodríguez, J.M., Salcedo, J.L. (2004). A comparison of learning algorithms for Bayesian networks: a case study based on data from an emergency medical service. Artificial Intelligence in Medicine, 30(3), 215–232. Agrawal, A., Gans, J., Goldfarb, A. (2018). Prediction machines: the simple economics of artificial intelligence. Harvard Business Review Press. Amodei, D., Hernandez, D., Sastry, G., Clark, J., Brockman, G., Sutskever, I. (2019). AI and compute. https://openai.com/blog/ai-and-compute/. Bandyopadhyay, P.S., & Forster, M.R. (2011). Philosophy of statistics: an introduction. North Holland, pp 1–52. Bishop, C.M. (2006). Pattern recognition and machine learning. Berlin: Springer. Burkov, A. (2019). The hundred-page machine learning book. Andriy Burkov. Cabrera, F. (2017). Can there be a Bayesian explanationism? On the prospects of a productive partnership. Synthese, 194(4), 1245–1272. Cahsai, A, Ntarmos, N, Anagnostopoulos, C, Triantafillou, P. (2017). Scaling k-nearest neighbours queries (the right way). In 2017 IEEE 37th international conference on distributed computing systems (ICDCS) (pp. 1419–1430). Canziani, A, Paszke, A, Culurciello, E. (2016). An analysis of deep neural network models for practical applications. CoRR arXiv:1605.07678. Carman, A. (2018). Amazon shipped over 5 billion items worldwide through prime in 2017. https://www.theverge.com/2018/1/2/16841786/amazon-prime-2017-users-ship-five-billion. Claeskens, G., & Hjort, N.L. (2008). Model selection and model averaging. Cambridge: Cambridge University Press. Dawid, R. (2017). Bayesian perspectives on the discovery of the Higgs particle. Synthese, 194(2), 377–394. de Rooij, S., & Grünwald, P.D. (2011). Luckiness and regret in minimum description length inference. North Holland, pp 865–900. Dhiraj, J.D.K. (2019). An evaluation of deep learning based object detection strategies for threat object detection in baggage security imagery. Pattern Recognition Letters, 120, 112–119. Domingos, P. (1999). The role of Occam’s razor in knowledge discovery. Data Mining and Knowledge Discovery, 3(4), 409–425. Fernández-Delgado, M., Cernadas, E., Barro, S., Amorim, D. (2014). Do we need hundreds of classifiers to solve real world classification problems? Journal of Machine Learning Research, 15, 3133–3181. Forster, M.R. (2001). The new science of simplicity, (pp. 83–119). Cambridge: Cambridge University Press. Ganguly, A., Muralidhar, R., Singh, V. (2019). Towards energy efficient non-von Neumann architectures for deep learning. In 20th international symposium on quality electronic design (ISQED) (pp. 335–342). Grünwald, P.D. (2007). The minimum description length principle. The MIT Press. Grünwald, P, & Langford, J. (2007). Suboptimal behavior of Bayes and MDL in classification under misspecification. Machine Learning, 66(2), 119–149. Hastie, T., Tibshirani, R., Friedman, J. (2009). The elements of statistical learning Vol. 2. Berlin: Springer. Henderson, L., Goodman, N., Tenenbaum, J., Woodward, J. (2010). The structure and dynamics of scientific theories: a hierarchical Bayesian perspective. Philosophy of Science, 77(2), 172–200. Hestness, J., Narang, S., Ardalani, N., Diamos, G.F., et al. (2017). Deep learning scaling is predictable, empirically. CoRR arXiv:1712.00409. Hong, J., Wang, Z., Niu, W. (2019). A simple approximation algorithm for the diameter of a set of points in an Euclidean plane. PLOS ONE, 14(2), 1–13. Huang, Y., & Beck, J.L. (2018). Full Gibbs sampling procedure for Bayesian system identification incorporating sparse Bayesian learning with automatic relevance determination. Computer-Aided Civil and Infrastructure Engineering, 33(9), 712–730. Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A., Fathi, A., Fischer, I., Wojna, Z., Song, Y., Guadarrama, S., Murphy, K. (2017). Speed/accuracy trade-offs for modern convolutional object detectors. In 2017 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3296–3297). James, G., Witten, D., Hastie, T., Tibshirani, R. (2014). An introduction to statistical learning with applications in R. Berlin: Springer. Jiang, L., Zhang, L., Li, C., Wu, J. (2019). A correlation-based feature weighting filter for Naive Bayes. IEEE Transactions on Knowledge and Data Engineering, 31(2), 201–213. Jose, C., Goyal, P., Aggrwal, P., Varma, M. (2013). Local deep kernel learning for efficient non-linear SVM prediction. In Dasgupta, S., & McAllester, D. (Eds.) Proceedings of the 30th international conference on machine learning, PMLR, Atlanta, Georgia, USA, Proceedings of Machine Learning Research, (Vol. 28 pp. 486–494). Kang, D., Kang, D., Kang, J., Yoo, S., Ha, S. (2018). Joint optimization of speed, accuracy, and energy for embedded image recognition systems. In Proceedings of the 2018 design, automation and test in Europe conference and exhibition. DATE 2018, vol 2018-January, pp. 715–720. Kelly, K.T. (2007). Ockham’s razor, empirical complexity, and truth-finding efficiency. Theoretical Computer Science, 383(2), 270–289. Kelly, K.T. (2011). Simplicity, truth and probability. North Holland, pp 983–1026. Korb, K.B. (2004). Introduction: machine learning as philosophy of science. Minds and Machines, 14(4), 433–440. Kpotufe, S., & Verma, N. (2017). Time-accuracy tradeoffs in kernel prediction: controlling prediction quality. Journal of Machine Learning Research, 18(44), 1–29. Levitin, A. (2011) In 3 (Ed.), The design and analysis of algorithms. London: Pearson Education. Li, D., Chen, X., Becchi, M., Zong, Z. (2016). Evaluating the energy efficiency of deep convolutional neural networks on CPUs and GPUs. In 2016 IEEE international conferences on big data and cloud computing (BDCloud), social computing and networking (SocialCom), sustainable computing and communications (SustainCom) (BDCloud-SocialCom-SustainCom), pp. 477–484. Lin, H.W., & Tegmark, M. (2016). Why does deep and cheap learning work so well? arXiv:1608.08225. Ma, J., Chen, L., Gao, Z. (2018). Hardware implementation and optimization of tiny-YOLO network. Communications in Computer and Information Science, 815, 224–234. Mohri, M., Rostamizadeh, A., Talwalkar, A. (2014). Foundations of machine learning. Cambridge: The MIT Press. Montanez, G.D. (2017). Why machine learning works. https://www.cs.cmu.edu/gmontane/montanez_dissertation.pdf. Moore, C., & Mertens, S. (2011). The nature of computation. Oxford: Oxford University Press. Murphy, K.P. (2012). Machine learning: a probabilistic perspective. Cambridge: The MIT Press. Oneto, L., Navarin, N., Donini, M., Ridella, S., Sperduti, A., Aiolli, F., Anguita, D. (2018). Learning with kernels: a local Rademacher complexity-based analysis with application to graph kernels. IEEE Transactions on Neural Networks and Learning Systems, 29(10), 4660–4671. Parashar, A., Raina, P., Shao, Y.S., Chen, Y., Ying, V.A., Mukkara, A., Venkatesan, R., Khailany, B., Keckler, S.W., Emer, J. (2019). Timeloop: a systematic approach to DNN accelerator evaluation. In 2019 IEEE international symposium on performance analysis of systems and software (ISPASS) (pp. 304–315). Pereira, F., Norvig, P., Halevy, A. (2009). The unreasonable effectiveness of data. IEEE Intelligent Systems, 24(2), 8–12. Pothos, E.M., & Wolff, J.G. (2006). The simplicity and power model for inductive inference. Artificial Intelligence Review, 26(3), 211–225. Raschka, S. (2018). Model evaluation, model selection, and algorithm selection in machine learning. CoRR arXiv:1811.12808. Redmon, J., & Farhadi, A. (2018). YOLOv3: an incremental improvement. CoRR arXiv:1804.02767. Ren, S., He, K., Girshick, R., Sun, J. (2017). Faster r-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6), 1137–1149. Rochefort-Maranda, G. (2016). Simplicity and model selection. European Journal for Philosophy of Science (6): 261–279. Russell, S.J., & Norvig, P. (2016). Artificial intelligence: a modern approach. Pearson Education Limited, Harlow, Essex, England. Shalev-Shwartz, S., & Ben-David, S. (2014). Understanding machine learning: from theory to algorithms. Cambridge: Cambridge University Press. Sober, E. (2015). Ockham’s razor: a user manual. Cambridge: Cambridge University Press. Strubell, E., Ganesh, A., McCallum, A. (2019). Energy and policy considerations for deep learning in NLP. In Proceedings of the 57th annual meeting of the association for computational linguistics. Sun, C., Shrivastava, A., Singh, S., Gupta, A. (2017). Revisiting unreasonable effectiveness of data in deep learning era. In The IEEE international conference on computer vision (ICCV) (pp. 843–852). Vapnik, V.N. (2000). The nature of statistical learning theory. New York: Springer. Wasserman, L. (2000). Bayesian model selection and model averaging. Journal of Mathematical Psychology, 44(1), 92–107. Wirth, N. (1985) In 2 (Ed.), Algorithms + data structures = programs. Englewood Cliffs: Prentice-Hall. Witten, I.H., Frank, E., Hall, M.A., Pal, C.J. (2017). Data mining: practical machine learning tools and techniques, 4th edn. Cambridge: Morgan Kaufmann. Yang, T., Chen, Y., Emer, J., Sze, V. (2017). A method to estimate the energy consumption of deep neural networks. In 2017 51st asilomar conference on signals, systems, and computers (pp. 1916–1920).