Monitoring machine learning models: a categorization of challenges and methods
Tài liệu tham khảo
Adler, 2016, Safety engineering for autonomous vehicles, 200
Albarghouthi, 2019, Fairness-aware programming, 211
Alpaydin, 2020
2000
Amershi, 2019, Software engineering for machine learning: a case study, 291
Amershi, 2015, ModelTracker: redesigning performance analysis tools for machine learning, 337
Arpteg, 2018, Software engineering challenges of deep learning, 50
Ashmore, 2019
Balzert, 2009
Barocas, 2016, Big data’s disparate impact, 104 California Law Review, 671
Barr, 2015, The oracle problem in software testing: a survey, IEEE Trans. Software Eng., 41, 507, 10.1109/TSE.2014.2372785
Bartlett, 2000, Learning changing concepts by exploiting the structure of change, Mach. Learn., 41, 153, 10.1023/A:1007604202679
Baylor, 2017, 1387
Bengio, 2012, Deep learning of representations for unsupervised and transfer learning, 17
Beran, 1977, Minimum hellinger distance estimates for parametric models, Ann. Stat., 5, 445, 10.1214/aos/1176343842
Bernardi, 2019, 150 successful machine learning models: 6 lessons learned at Booking.com, 1743
Bhatt, 2020
Boehm, 2019, Data management in machine learning systems, Synth. Lectures Data Manag., 14, 1
Bojarski, 2016
Bolukbasi, 2016
Borg, 2018
Breck, 2017, The ML test score: a rubric for ML production readiness and technical debt reduction, 1123
Breck, 2019, Data validation for machine learning
Bridge, 2016, Intraobserver variability: should we worry?, J. Med. Imag. Radiat. Sci., 47, 217, 10.1016/j.jmir.2016.06.004
Cam
Chen, 2013, State of the art: dynamic symbolic execution for automated test generation, Future Generat. Comput. Syst., 29, 1758, 10.1016/j.future.2012.02.006
Cheng, 2017
Chilakapati, 2019
Clark
Clarke, 2012, Model checking and the state explosion problem, 1
Corbett-Davies, 2018
Cramer, 1940, On the theory of stationary random processes, Ann. Math., 41, 215, 10.2307/1968827
Davenport, 2018, Artificial intelligence for the real world, Harv. Bus. Rev
Devlin, 2019
Doshi-Velez, 2017
Dwork, 2011
Engstrom, 2019
Ernst, 2007, The Daikon system for dynamic detection of likely invariants, Sci. Comput. Program., 69, 35, 10.1016/j.scico.2007.01.015
Esteva, 2017, Dermatologist-level classification of skin cancer with deep neural networks, Nature, 542, 115, 10.1038/nature21056
2016
EU Parliament, 2021
Farquhar, 2019
Finkelstein, 2008, Fairness analysis in requirements assignments, RE’08, 115
Gajane, 2018
Gama, 2014, A survey on concept drift adaptation, ACM Comput. Surv., 46, 10.1145/2523813
Gardiner, 1999
Gass, 1981, Concepts of model confidence, Comput. Oper. Res., 8, 341, 10.1016/0305-0548(81)90019-8
Glorot, 2010, Understanding the difficulty of training deep feedforward neural networks, 249
Guo, 2017
Haldar, 2019, Applying deep learning to Airbnb search
Hall, 2018
Hansson, 2016, Machine learning algorithms in heavy process manufacturing, Am. J. Intell. Syst., 6, 1
Hastie, 2009, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed, Springer, New York
Heckman, 1979, Sample selection bias as a specification error, Econometrica, 47, 153, 10.2307/1912352
Hendrycks, 2019
Herrera
Hüllermeier, 2020
Hornik, 1989, Multilayer feedforward networks are universal approximators, Neural Network, 2, 359, 10.1016/0893-6080(89)90020-8
Huang, 2017
Hynes, 2017, The data linter: lightweight automated sanity checking for ML data sets, 1
1990, 1
Jagielski, 2018
Japkowicz, 2002, The class imbalance problem: a systematic study, Intell. Data Anal., 6, 429, 10.3233/IDA-2002-6504
Jones, 2019, Predicts 2020: data and analytics strategies—invest, influence and impact
Jordan, 2015, Machine learning: trends, perspectives, and prospects, Science, 349, 255, 10.1126/science.aaa8415
Kanewala, 2018
Katz, 2017
King, 1976, Symbolic execution and program testing, Commun. ACM, 19, 385, 10.1145/360248.360252
Kitchenham, 1989, A quantitative approach to monitoring software development, Software Eng. J., 4, 2, 10.1049/sej.1989.0001
Klaise, 2020
Kodiyan
Krishnan, 2017
Kullback, 1951, On information and sufficiency, Ann. Math. Stat., 22, 79, 10.1214/aoms/1177729694
Kumar, 2020
Kusner, 2018
Lakshminarayanan, 2017
L’Heureux, 2017, Machine learning with big data: challenges and approaches, IEEE Access, 5, 7776, 10.1109/ACCESS.2017.2696365
Ma, 2019
McGregor, 2020, Preventing repeated real world AI failures by cataloging incidents: the AI incident database, 15458
McMahan, 2013, Ad click prediction: a view from the trenches, 1222
Miljković, 2010, Review of novelty detection methods, 593
Miller, 2018
Moreno-Torres, 2012, A unifying view on dataset shift in classification, Pattern Recogn., 45, 521, 10.1016/j.patcog.2011.06.019
Morgenthaler, 2012, Searching for build debt: experiences managing technical debt at Google, 1
Murphy, 2007, An approach to software testing of machine learning applications, 1
Ovadia, 2019
Paleyes, 2020
Pei, 2017, DeepXplore: automated whitebox testing of deep learning systems, 1
Peled, 1999, Black box checking, 225
Pham, 2006, Introduction. In: System Software Reliability, pp. 1–7
Polyzotis, 2017, Data management challenges in production machine learning, 1723
Quionero-Candela, 2009
Ré, 2019
Ribeiro, 2016
Salay, 2018
Schelter, 2018, On challenges in machine learning model management, IEEE Data Eng. Bull., 41, 5
Schelter, 2018, Automating large-scale data quality verification, Proc. VLDB Endowment, 11, 1781, 10.14778/3229863.3229867
Schlimmer, 1986, Incremental learning from noisy data, Machine Language, 1, 317
Schubmehl, 2020
Sculley, 2015, Hidden technical debt in machine learning systems, 2503
Sharir, 2020
Spanfelner
Spiegelhalter, 2002, Bayesian measures of model complexity and fit, J. Roy. Stat. Soc., 64, 583, 10.1111/1467-9868.00353
Stoica, 2017
Sun, 2019
Sutton, 2018, Data diff: interpretable, executable summaries of changes in distributions for data wrangling, 2279
Tallarida, 1987, Chi-square test, 140
Tan, 2018
Touvron, 2020
Tripathi, 2020
Vartak, 2018, MODELDB: opportunities and challenges in managing machine learning models, IEEE Data Eng. Bull., 41, 16
Vartak, 2016, ModelDB: a system for machine learning model management, 1
Vaswani, 2017, Attention is all you need, 6000
Verma, 2018, Fairness definitions explained, 1
Voas, 1995, Software testability: the new verification, IEEE Software, 12, 17, 10.1109/52.382180
Wagstaff, 2019, Enabling onboard detection of events of scientific interest for the europa clipper spacecraft, 2191
Webb, 2016, Characterizing concept drift, Data Min. Knowl. Discov., 30, 964, 10.1007/s10618-015-0448-4
Willmott, 2005, Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance, Clim. Res., 30, 79, 10.3354/cr030079
Wolf, 2017, Why we should have seen that coming: comments on microsoft’s tay “experiment”, and wider implications, ORBIT J., 1, 1, 10.29297/orbit.v1i2.49
Yang, 2020
Yu, 2004, Efficient feature selection via analysis of relevance and redundancy, J. Mach. Learn. Res., 5, 1205
Zafar, 2017
Zaharia, 2018, Accelerating the machine learning lifecycle with MLflow, IEEE Data Eng. Bull., 41, 39
Zhang, 2019
Zhao, 2019
Zhu, 1997, Software unit test coverage and adequacy, ACM Comput. Surv., 29, 366, 10.1145/267580.267590