Monitoring machine learning models: a categorization of challenges and methods

Data Science and Management - Tập 5 - Trang 105-116 - 2022
Tim Schröder1, Michael Schulz1
1Department of Computer Science, Nordakademie Hochschule der Wirtschaft, Van-der-Smissen-Straße 9, Hamburg, 22767, Germany

Tài liệu tham khảo

Adler, 2016, Safety engineering for autonomous vehicles, 200 Albarghouthi, 2019, Fairness-aware programming, 211 Alpaydin, 2020 2000 Amershi, 2019, Software engineering for machine learning: a case study, 291 Amershi, 2015, ModelTracker: redesigning performance analysis tools for machine learning, 337 Arpteg, 2018, Software engineering challenges of deep learning, 50 Ashmore, 2019 Balzert, 2009 Barocas, 2016, Big data’s disparate impact, 104 California Law Review, 671 Barr, 2015, The oracle problem in software testing: a survey, IEEE Trans. Software Eng., 41, 507, 10.1109/TSE.2014.2372785 Bartlett, 2000, Learning changing concepts by exploiting the structure of change, Mach. Learn., 41, 153, 10.1023/A:1007604202679 Baylor, 2017, 1387 Bengio, 2012, Deep learning of representations for unsupervised and transfer learning, 17 Beran, 1977, Minimum hellinger distance estimates for parametric models, Ann. Stat., 5, 445, 10.1214/aos/1176343842 Bernardi, 2019, 150 successful machine learning models: 6 lessons learned at Booking.com, 1743 Bhatt, 2020 Boehm, 2019, Data management in machine learning systems, Synth. Lectures Data Manag., 14, 1 Bojarski, 2016 Bolukbasi, 2016 Borg, 2018 Breck, 2017, The ML test score: a rubric for ML production readiness and technical debt reduction, 1123 Breck, 2019, Data validation for machine learning Bridge, 2016, Intraobserver variability: should we worry?, J. Med. Imag. Radiat. Sci., 47, 217, 10.1016/j.jmir.2016.06.004 Cam Chen, 2013, State of the art: dynamic symbolic execution for automated test generation, Future Generat. Comput. Syst., 29, 1758, 10.1016/j.future.2012.02.006 Cheng, 2017 Chilakapati, 2019 Clark Clarke, 2012, Model checking and the state explosion problem, 1 Corbett-Davies, 2018 Cramer, 1940, On the theory of stationary random processes, Ann. Math., 41, 215, 10.2307/1968827 Davenport, 2018, Artificial intelligence for the real world, Harv. Bus. Rev Devlin, 2019 Doshi-Velez, 2017 Dwork, 2011 Engstrom, 2019 Ernst, 2007, The Daikon system for dynamic detection of likely invariants, Sci. Comput. Program., 69, 35, 10.1016/j.scico.2007.01.015 Esteva, 2017, Dermatologist-level classification of skin cancer with deep neural networks, Nature, 542, 115, 10.1038/nature21056 2016 EU Parliament, 2021 Farquhar, 2019 Finkelstein, 2008, Fairness analysis in requirements assignments, RE’08, 115 Gajane, 2018 Gama, 2014, A survey on concept drift adaptation, ACM Comput. Surv., 46, 10.1145/2523813 Gardiner, 1999 Gass, 1981, Concepts of model confidence, Comput. Oper. Res., 8, 341, 10.1016/0305-0548(81)90019-8 Glorot, 2010, Understanding the difficulty of training deep feedforward neural networks, 249 Guo, 2017 Haldar, 2019, Applying deep learning to Airbnb search Hall, 2018 Hansson, 2016, Machine learning algorithms in heavy process manufacturing, Am. J. Intell. Syst., 6, 1 Hastie, 2009, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed, Springer, New York Heckman, 1979, Sample selection bias as a specification error, Econometrica, 47, 153, 10.2307/1912352 Hendrycks, 2019 Herrera Hüllermeier, 2020 Hornik, 1989, Multilayer feedforward networks are universal approximators, Neural Network, 2, 359, 10.1016/0893-6080(89)90020-8 Huang, 2017 Hynes, 2017, The data linter: lightweight automated sanity checking for ML data sets, 1 1990, 1 Jagielski, 2018 Japkowicz, 2002, The class imbalance problem: a systematic study, Intell. Data Anal., 6, 429, 10.3233/IDA-2002-6504 Jones, 2019, Predicts 2020: data and analytics strategies—invest, influence and impact Jordan, 2015, Machine learning: trends, perspectives, and prospects, Science, 349, 255, 10.1126/science.aaa8415 Kanewala, 2018 Katz, 2017 King, 1976, Symbolic execution and program testing, Commun. ACM, 19, 385, 10.1145/360248.360252 Kitchenham, 1989, A quantitative approach to monitoring software development, Software Eng. J., 4, 2, 10.1049/sej.1989.0001 Klaise, 2020 Kodiyan Krishnan, 2017 Kullback, 1951, On information and sufficiency, Ann. Math. Stat., 22, 79, 10.1214/aoms/1177729694 Kumar, 2020 Kusner, 2018 Lakshminarayanan, 2017 L’Heureux, 2017, Machine learning with big data: challenges and approaches, IEEE Access, 5, 7776, 10.1109/ACCESS.2017.2696365 Ma, 2019 McGregor, 2020, Preventing repeated real world AI failures by cataloging incidents: the AI incident database, 15458 McMahan, 2013, Ad click prediction: a view from the trenches, 1222 Miljković, 2010, Review of novelty detection methods, 593 Miller, 2018 Moreno-Torres, 2012, A unifying view on dataset shift in classification, Pattern Recogn., 45, 521, 10.1016/j.patcog.2011.06.019 Morgenthaler, 2012, Searching for build debt: experiences managing technical debt at Google, 1 Murphy, 2007, An approach to software testing of machine learning applications, 1 Ovadia, 2019 Paleyes, 2020 Pei, 2017, DeepXplore: automated whitebox testing of deep learning systems, 1 Peled, 1999, Black box checking, 225 Pham, 2006, Introduction. In: System Software Reliability, pp. 1–7 Polyzotis, 2017, Data management challenges in production machine learning, 1723 Quionero-Candela, 2009 Ré, 2019 Ribeiro, 2016 Salay, 2018 Schelter, 2018, On challenges in machine learning model management, IEEE Data Eng. Bull., 41, 5 Schelter, 2018, Automating large-scale data quality verification, Proc. VLDB Endowment, 11, 1781, 10.14778/3229863.3229867 Schlimmer, 1986, Incremental learning from noisy data, Machine Language, 1, 317 Schubmehl, 2020 Sculley, 2015, Hidden technical debt in machine learning systems, 2503 Sharir, 2020 Spanfelner Spiegelhalter, 2002, Bayesian measures of model complexity and fit, J. Roy. Stat. Soc., 64, 583, 10.1111/1467-9868.00353 Stoica, 2017 Sun, 2019 Sutton, 2018, Data diff: interpretable, executable summaries of changes in distributions for data wrangling, 2279 Tallarida, 1987, Chi-square test, 140 Tan, 2018 Touvron, 2020 Tripathi, 2020 Vartak, 2018, MODELDB: opportunities and challenges in managing machine learning models, IEEE Data Eng. Bull., 41, 16 Vartak, 2016, ModelDB: a system for machine learning model management, 1 Vaswani, 2017, Attention is all you need, 6000 Verma, 2018, Fairness definitions explained, 1 Voas, 1995, Software testability: the new verification, IEEE Software, 12, 17, 10.1109/52.382180 Wagstaff, 2019, Enabling onboard detection of events of scientific interest for the europa clipper spacecraft, 2191 Webb, 2016, Characterizing concept drift, Data Min. Knowl. Discov., 30, 964, 10.1007/s10618-015-0448-4 Willmott, 2005, Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance, Clim. Res., 30, 79, 10.3354/cr030079 Wolf, 2017, Why we should have seen that coming: comments on microsoft’s tay “experiment”, and wider implications, ORBIT J., 1, 1, 10.29297/orbit.v1i2.49 Yang, 2020 Yu, 2004, Efficient feature selection via analysis of relevance and redundancy, J. Mach. Learn. Res., 5, 1205 Zafar, 2017 Zaharia, 2018, Accelerating the machine learning lifecycle with MLflow, IEEE Data Eng. Bull., 41, 39 Zhang, 2019 Zhao, 2019 Zhu, 1997, Software unit test coverage and adequacy, ACM Comput. Surv., 29, 366, 10.1145/267580.267590