Prediction intervals for future BMI values of individual children - a non-parametric approach by quantile boosting
Tóm tắt
The construction of prediction intervals (PIs) for future body mass index (BMI) values of individual children based on a recent German birth cohort study with n = 2007 children is problematic for standard parametric approaches, as the BMI distribution in childhood is typically skewed depending on age. We avoid distributional assumptions by directly modelling the borders of PIs by additive quantile regression, estimated by boosting. We point out the concept of conditional coverage to prove the accuracy of PIs. As conditional coverage can hardly be evaluated in practical applications, we conduct a simulation study before fitting child- and covariate-specific PIs for future BMI values and BMI patterns for the present data. The results of our simulation study suggest that PIs fitted by quantile boosting cover future observations with the predefined coverage probability and outperform the benchmark approach. For the prediction of future BMI values, quantile boosting automatically selects informative covariates and adapts to the age-specific skewness of the BMI distribution. The lengths of the estimated PIs are child-specific and increase, as expected, with the age of the child. Quantile boosting is a promising approach to construct PIs with correct conditional coverage in a non-parametric way. It is in particular suitable for the prediction of BMI patterns depending on covariates, since it provides an interpretable predictor structure, inherent variable selection properties and can even account for longitudinal data structures.
Tài liệu tham khảo
Sassi F, Devaux M, Cecchini M, Rusticelli E: The Obesity Epidemic: Analysis of Past and Projected Future Trends in Selected OECD Countries. OECD Health Working Papers. 2009, 45:
Dehghan M, Akhtar-Danesh N, Merchant A: Childhood Obesity, Prevalence and Prevention. Nutrition Journal. 2005, 4: 24-10.1186/1475-2891-4-24.
Jansen I, Katzmarzykt P, Srinivasan S, Chenl W, Malina R, Bouchard C, Berenson G: Utility of Childhood BMI in the Prediction of Adulthood Disease: Comparison of National and International References. Obesity Research. 2005, 13: 1106-1115. 10.1038/oby.2005.129.
Whitaker R, Wright J, Pepe M, Seidel K, Dietz W: Predicting Obesity in Young Adulthood from Childhood and Parental Obesity. New England Journal of Medicine. 1997, 337 (13): 869-873. 10.1056/NEJM199709253371301.
LISA-plus Study Group: 1998, Information about the study is available at http://www.helmholtz-muenchen.de/epi/arbeitsgruppen/umweltepidemiologie/projects-projekte/lisa-plus/index.html
Reilly JJ, Armstrong J, Dorosty AR, Emmett PM, Ness A, Rogers I, Steer C, Sherriff A: Early Life Risk Factors for Obesity in Childhood: Cohort Study. British Medical Journal. 2005, 330: 1357-1364. 10.1136/bmj.38470.670903.E0.
Beyerlein A, Toschke AM, von Kries R: Risk Factors for Childhood Overweight: Shift of the Mean Body Mass Index and Shift of the Upper Percentiles: Results From a Cross-Sectional Study. International Journal of Obesity. 2010, 34 (4): 642-648. 10.1038/ijo.2009.301.
Beyerlein A, Fahrmeir L, Mansmann U, Toschke A: Alternative Regression Models to Assess Increase in Childhood BMI. BMC Medical Research Methodology. 2008, 8 (59):
Fenske N, Fahrmeir L, Rzehak P, Höhle M: Detection of Risk Factors for Obesity in Early Childhood with Quantile Regression Methods for Longitudinal Data. Technical Report, Department of Statistics, University of Munich. 2008, 038:
Rigby RA, Stasinopoulos DM: Generalized Additive Models for Location, Scale and Shape (with Discussion). Applied Statistics. 2005, 54: 507-554. 10.1111/j.1467-9876.2005.00510.x.
Mayr A, Fenske N, Hofner B, Kneib T, Schmid M: GAMLSS for High-Dimensional Data - a Flexible Approach Based on Boosting. Journal of the Royal Statistical Society, Series C (Applied Statistics). 2012, [To appear]
Meinshausen N: Quantile Regression Forests. Journal Machine Learning Research. 2006, 7: 983-999.
Fenske N, Kneib T, Hothorn T: Identifying Risk Factors for Severe Childhood Malnutrition by Boosting Additive Quantile Regression. Journal of the American Statistical Association. 2011, 106 (494): 494-510. 10.1198/jasa.2011.ap09272.
Koenker R: Quantile Regression. 2005, New York: Cambridge University Press
Koenker R, Ng P, Portnoy S: Quantile Smoothing Splines. Biometrika. 1994, 81 (4): 673-680. 10.1093/biomet/81.4.673.
Breiman L: Random Forests. Machine Learning. 2001, 45: 5-32. 10.1023/A:1010933404324.
Friedman JH: Greedy Function Approximation: A Gradient Boosting Machine. Annals of Statistics. 2001, 29: 1189-1232.
Tibshirani R: Regression Shrinkage and Selection via the Lasso. J Roy Statist Soc Ser B. 1996, 58: 267-288.
Bühlmann P, Hothorn T: Boosting Algorithms: Regularization, Prediction and Model Fitting. Journal of Statistical Science. 2007, 22 (4): 477-505. 10.1214/07-STS242.
Efron B: Biased Versus Unbiased Estimation. Advances in Mathematics. 1975, 16: 259-277. 10.1016/0001-8708(75)90114-0.
Copas JB: Regression, Prediction and Shrinkage. Royal Statistical Society, Series B. 1983, 45: 311-354.
Hastie T, Tibshirani R, Friedman J: The Elements of Statistical Learning: Data Mining, Inference and Prediction. 2009, Springer, 2
Hastie T: Comment: Boosting Algorithms: Regularization, Prediction and Model Fitting. Journal of Statistical Science. 2007, 22 (4): 513-515. 10.1214/07-STS242A.
R Development Core Team: R: A Language and Environment for Statistical Computing. 2009, R Foundation for Statistical Computing, Vienna, Austria, http://www.R-project.org. [ISBN 3-900051-07-0]
Hothorn T, Bühlmann P, Kneib T, Schmid M, Hofner B: mboost: Model-Based Boosting. 2010, http://R-forge.R-project.org/projects/mboost. [R package version 2.1-0]
Hothorn T, Bühlmann P, Kneib T, Schmid M, Hofner B: Model-based Boosting 2.0. Journal of Machine Learning Research. 2010, 11: 2109-2113.
Meinshausen N: quantregForest: Quantile Regression Forests. 2007, [R package version 0.2-2]
Wei Y, He X: Conditional Growth Charts. Annals of Statistics. 2006, 34: 2069-10.1214/009053606000000623.
Wei Y, Pere A, Koenker R, He X: Quantile Regression Methods for Reference Growth Charts. Statistics in Medicine. 2006, 25 (8): 1369-1382. 10.1002/sim.2271.
Kneib T, Hothorn T, Tutz G: Variable Selection and Model Choice in Geoadditive Regression Models. Biometrics. 2009, 65 (2): 626-634. 10.1111/j.1541-0420.2008.01112.x. [Including the web-based supplementary]
Koenker R: Quantile Regression for Longitudinal Data. Journal of Multivariate Analysis. 2004, 91: 74-89. 10.1016/j.jmva.2004.05.006.
The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2288/12/6/prepub