Supersparse linear integer models for optimized medical scoring systems

Berk Ustun1, Cynthia Rudin2
1Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA
2Sloan School of Management and CSAIL, Massachusetts Institute of Technology, Cambridge, MA, USA

Tóm tắt

Từ khóa


Tài liệu tham khảo

Antman, E. M., Cohen, M., Bernink, P. J. L. M., McCabe, C. H., Horacek, T., Papuchis, G., et al. (2000). The TIMI risk score for unstable angina/non-ST elevation MI. The Journal of the American Medical Association, 284(7), 835–842.

Asparoukhov, O. K., & Stam, A. (1997). Mathematical programming formulations for two-group classification with binary variables. Annals of Operations Research, 74, 89–112.

Bache, K., & Lichman, M. (2013). UCI machine learning repository

Bajgier, S. M., & Hill, A. V. (1982). An experimental comparison of statistical and linear programming approaches to the discriminant problem. Decision Sciences, 13(4), 604–618.

Bien, J., Taylor, J., Tibshirani, R., et al. (2013). A lasso for hierarchical interactions. The Annals of Statistics, 41(3), 1111–1141.

Bone, R. C., Balk, R. A., Cerra, F. B., Dellinger, R. P., Fein, A. M., Knaus, W. A., et al. (1992). American college of chest physicians/society of critical care medicine consensus conference: Definitions for sepsis and organ failure and guidelines for the use of innovative therapies in sepsis. Critical Care Medicine, 20(6), 864–874.

Bousquet, O., Boucheron, S., & Lugosi, G. (2004). Introduction to statistical learning theory. In Advanced lectures on machine learning. Springer, pp. 169–207

Bradley, P. S., Fayyad, U. M., & Mangasarian, O. L. (1999). Mathematical programming for data mining: Formulations and challenges. INFORMS Journal on Computing, 11(3), 217–238.

Brooks, J. P. (2011). Support vector machines with the ramp loss and the hard margin loss. Operations Research, 59(2), 467–479.

Brooks, J. P., & Lee, E. K. (2010). Analysis of the consistency of a mixed integer programming-based multi-category constrained discriminant model. Annals of Operations Research, 174(1), 147–168.

Carrizosa, E., Martín-Barragán, B., & Morales, D. R. (2010). Binarized support vector machines. INFORMS Journal on Computing, 22(1), 154–167.

Carrizosa, E., Nogales-Gómez, A., & Morales, D. R. (2013). Strongly agree or strongly disagree? Rating features in support vector machines. Technical report, Saïd Business School, University of Oxford, UK

Chevaleyre, Y., Koriche, F. , & Zucker, J.-D. (2013). Rounding methods for discrete linear classification. In Proceedings of the 30th international conference on machine learning (ICML-13) , pp. 651–659.

Cranor, L. F., & LaMacchia, B. A. (1998). Spam!. Communications of the ACM, 41(8), 74–83.

Detrano, R., Janosi, A., Steinbrunn, W., Pfisterer, M., Schmid, J.-J., Sandhu, S., et al. (1989). International application of a new probability algorithm for the diagnosis of coronary artery disease. The American journal of cardiology, 64(5), 304–310.

Dupačová, J., Consigli, G., & Wallace, S. W. (2000). Scenarios for multistage stochastic programs. Annals of operations research, 100(1–4), 25–53.

Dupačová, J., Gröwe-Kuska, N., & Römisch, W. (2003). Scenario reduction in stochastic programming. Mathematical programming, 95(3), 493–511.

Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. The Annals of Statistics, 32(2), 407–499.

Elter, M., Schulz-Wendtland, R., & Wittenberg, T. (2007). The prediction of breast cancer biopsy outcomes using two cad approaches that both emphasize an intelligible decision process. Medical Physics, 34(11), 4164–4172.

Freitas, A. A. (2014). Comprehensible classification models: A position paper. ACM SIGKDD Explorations Newsletter, 15(1), 1–10.

Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1–22.

Gage, B. F., Waterman, A. D., Shannon, W., Boechler, M., Rich, M. W., & Radford, M. J. (2001). Validation of clinical classification schemes for predicting stroke. The Journal of the American Medical Association, 285(22), 2864–2870.

Glen, J. J. (1999). Integer programming methods for normalisation and variable selection in mathematical programming discriminant analysis models. Journal of the Operational Research Society, 50, 1043–1053.

Goh, S. T., & Rudin, C. (2014). Box drawings for learning with imbalanced data. In Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp. 333–342.

Goldberg, N., & Eckstein, J. (2010). Boosting classifiers with tightened l0-relaxation penalties. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), pp. 383–390.

Goldberg, N., & Eckstein, J. (2012). Sparse weighted voting classifier selection and its linear programming relaxations. Information Processing Letters, 112, 481–486.

Guan, W., Gray, A., & Leyffer, S. (2009). Mixed-integer support vector machine. In NIPS workshop on optimization for machine learning.

Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. The Journal of Machine Learning Research, 3, 1157–1182.

Haberman, S. J. (1976). Generalized residuals for log-linear models. In Proceedings of the 9th international biometrics conference, Boston, pp. 104–122.

Hastie, T., Tibshirani, R., Friedman, J., Hastie, T., Friedman, J., & Tibshirani, R. (2009). The elements of statistical learning (Vol. 2). New York: Springer.

Jenatton, R., Audibert, J.-Y., & Bach, F. (2011). Structured variable selection with sparsity-inducing norms. The Journal of Machine Learning Research, 12, 2777–2824.

Jennings, D., Amabile, TM., & Ross, L. (1982). Informal covariation assessment: Data-based vs. theory-based judgments. Judgment under uncertainty: Heuristics and biases, pp. 211–230

Joachimsthaler, E. A., & Stam, A. (1990). Mathematical programming approaches for the classification problem in two-group discriminant analysis. Multivariate Behavioral Research, 25(4), 427–454.

Kapur, V. K. (2010). Obstructive sleep apnea: Diagnosis, epidemiology, and economics. Respiratory Care, 55(9), 1155–1167.

Kim, M.-J., & Han, I. (2003). The discovery of experts’ decision rules from qualitative bankruptcy data using genetic algorithms. Expert Systems with Applications, 25(4), 637–646.

Knaus, W. A., Zimmerman, J. E., Wagner, D. P., Draper, E. A., & Lawrence, D. E. (1981). APACHE-acute physiology and chronic health evaluation: a physiologically based classification system. Critical Care Medicine, 9(8), 591–597.

Knaus, W. A., Draper, E. A., Wagner, D. P., & Zimmerman, J. E. (1985). APACHE II: a severity of disease classification system. Critical Care Medicine, 13(10), 818–829.

Knaus, W. A., Wagner, D. P., Draper, E. A., Zimmerman, J. E., Bergner, M., Bastos, P. G., et al. (1991). The APACHE III prognostic system. Risk prediction of hospital mortality for critically ill hospitalized adults. Chest Journal, 100(6), 1619–1636.

Kodratoff, Y. (1994). The comprehensibility manifesto. KDD Nugget Newsletter, 94, 9.

Kohavi, R. (1996). Scaling up the accuracy of naive-bayes classifiers: A decision-tree hybrid. In KDD, pp. 202–207.

Kuhn, M., Weston, S., & Coulter, N. (2012). C50: C5.0 Decision trees and rule-based models, 2012. C code for C5.0 by R. Quinlan. R package version 0.1.0-013.

Le Gall, J.-R., Loirat, P., Alperovitch, A., Glaser, P., Granthil, C., Mathieu, D., et al. (1984). A simplified acute physiology score for icu patients. Critical Care Medicine, 12(11), 975–977.

Le Gall, J.-R., Lemeshow, S., & Saulnier, F. (1993). A new simplified acute physiology score (SAPS II) based on a european/north american multicenter study. The Journal of the American Medical Association, 270(24), 2957–2963.

Lee, E. K., & Wu, T.-L. (2009). Classification and disease prediction via mathematical programming. In Handbook of optimization in medicine. Springer, pp. 1–50.

Li, L., & Lin, H.-T. (2007). Optimizing 0/1 loss for perceptrons by random coordinate descent. In International joint conference on neural networks, 2007. IJCNN 2007. IEEE, pp. 749–754.

Light, R. W., Macgregor, M. I., Luchsinger, P. C., & Ball, W. C. (1972). Pleural effusions: The diagnostic separation of transudates and exudates. Annals of Internal Medicine, 77(4), 507–513.

Liittschwager, J. M., & Wang, C. (1978). Integer programming solution of a classification problem. Management Science, 24, 1515–1525.

Lin, D., Pitler, E., Foster, D. P., & Ungar, L. H. (2008). In defense of l0. In Workshop on feature selection (ICML 2008).

Liu, H., Hussain, F., Tan, C. L., & Dash, M. (2002). Discretization: An enabling technique. Data Mining and Knowledge Discovery, 6, 393–423.

Liu, H., & Zhang, J. (2009). Estimation consistency of the group lasso and its applications. In Proceedings of the twelfth international conference on artificial intelligence and statistics.

Mangasarian, O. L. (1994). Misclassification minimization. Journal of Global Optimization, 5(4), 309–323.

Mangasarian, O. L., Street, W. N., & Wolberg, W. H. (1995). Breast cancer diagnosis and prognosis via linear programming. Operations Research, 43(4), 570–577.

Mao, K. Z. (2004). Orthogonal forward selection and backward elimination algorithms for feature subset selection. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 34(1), 629–634.

Marklof, J. (2012, July). Fine-scale statistics for the multidimensional Farey sequence. ArXiv e-prints, July 2012.

Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., & Leisch, F. (2012). e1071: Misc functions of the department of statistics (e1071), TU Wien, 2012. R package version 1.6-1.

Miller, A. J. (1984). Selection of subsets of regression variables. Journal of the Royal Statistical Society Series A (General), 47, 389–425.

Moreno, R. P., Metnitz, P. G. H., Almeida, E., Jordan, B., Bauer, P., Campos, R. A., et al. (2005). SAPS 3—From evaluation of the patient to evaluation of the intensive care unit. Part 2: Development of a prognostic model for hospital mortality at icu admission. Intensive Care Medicine, 31(10), 1345–1355.

Nguyen, H. T., & Franke, K. (2012). A general lp-norm support vector machine via mixed 0-1 programming. In Machine learning and data mining in pattern recognition. Springer, pp. 40–49.

Nguyen, T., & Sanner, S. (2013). Algorithms for direct 0–1 loss optimization in binary classification. In Proceedings of the 30th international conference on machine learning (ICML-13), pp. 1085–1093.

Pazzani, M. J. (2000). Knowledge discovery from data? IEEE Intelligent Systems and Their Applications, 15(2), 10–12.

R Core Team. (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, 2014. URL http://www.R-project.org/ .

Ranson, J. H., Rifkind, K. M., Roses, D. F., Fink, S. D., Eng, K., Spencer, F. C., et al. (1974). Prognostic signs and the role of operative management in acute pancreatitis. Surgery, Gynecology & Obstetrics, 139(1), 69.

Rubin, P. A. (1990). Heuristic solution procedures for a mixed-integer programming discriminant model. Managerial and Decision Economics, 11, 255–266.

Rubin, P. A. (1997). Solving mixed integer classification problems by decomposition. Annals of Operations Research, 74, 51–64.

Rubin, P. A. (2009). Mixed integer classification problems. In Encyclopedia of optimization. Springer, pp. 2210–2214.

Schlimmer, J. C. (1987). Concept acquisition through representational adjustment.

Souillard-Mandar, W., Davis, R., Rudin, C., Au, R., Libon, D. J., Swenson, R., et al. (2015) Learning Classification Models of Cognitive Conditions from Subtle Behaviors in the Digital Clock Drawing Test. Machine Learning. Accepted

Therneau, T., Atkinson, B., & Ripley, B. (2012). rpart: Recursive Partitioning, 2012. URL http://CRAN.R-project.org/package=rpart . R package version 4.1-0.

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B (Methodological), 58, 267–288.

Towell, G. G., & Shavlik, J. W. (1993). Extracting refined rules from knowledge-based neural networks. Machine Learning, 13, 71–101.

Ustun, B., Westover, M. B., Rudin, C., & Bianchi, M. T. (2015). Clinical Prediction Models for Sleep Apnea: The Importance of Medical History over Symptoms. Journal of clinical sleep medicine: JCSM: official publication of the American Academy of Sleep Medicine.

Van Belle, V., Neven, P., Harvey, V., Van Huffel, S., Suykens, J. A. K., & Boyd, S. (2013). Risk group detection and survival function estimation for interval coded survival methods. Neurocomputing, 112, 200–210.

Vapnik, V. (1998). Statistical Learning Theory. New York: Wiley.

Wells, P. S., Anderson, D. R., Bormanis, J., Guy, F., Mitchell, M., Gray, L., et al. (1997). Value of assessment of pretest probability of deep-vein thrombosis in clinical management. Lancet, 350(9094), 1795–1798.

Wells, P. S., Anderson, D. R., Rodger, M., Ginsberg, J. S., Kearon, C., Gent, M., et al. (2000). Derivation of a simple clinical model to categorize patients probability of pulmonary embolism-increasing the models utility with the SimpliRED D-dimer. Thrombosis and Haemostasis, 83(3), 416–420.

Wolsey, L. A. (1998). Integer programming (Vol. 42). New York: Wiley.

Yanev, N., & Balev, S. (1999). A combinatorial approach to the classification problem. European Journal of Operational Research, 115(2), 339–350.

Zhao, P., & Bin, Y. (2007). On model selection consistency of lasso. Journal of Machine Learning Research, 7(2), 25–41.

Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320.

Zeng, J., Ustun, B., & Rudin, C. (2015). Interpretable Classification Models for Recidivism Prediction. arXiv preprint arXiv:1503.07810 .