Bayesian profile regression with an application to the National survey of children's health

Biostatistics - Tập 11 Số 3 - Trang 484-498 - 2010
John Molitor1, Michail Papathomas1, Michael Jerrett2, Sylvia Richardson3
1Department of Epidemiology and Biostatistics, School of Public Health, Imperial College, St Mary's Campus, Norfolk Place, London W2 1PG, UK [email protected]
2Division of Environmental Health Sciences, School of Public Health, University of California, Berkeley, CA 94720-7360, USA
3Department of Epidemiology and Biostatistics, School of Public Health, Imperial College, St Mary's Campus, Norfolk Place, London W2 1PG, UK

Tóm tắt

Abstract Standard regression analyses are often plagued with problems encountered when one tries to make inference going beyond main effects using data sets that contain dozens of variables that are potentially correlated. This situation arises, for example, in epidemiology where surveys or study questionnaires consisting of a large number of questions yield a potentially unwieldy set of interrelated data from which teasing out the effect of multiple covariates is difficult. We propose a method that addresses these problems for categorical covariates by using, as its basic unit of inference, a profile formed from a sequence of covariate values. These covariate profiles are clustered into groups and associated via a regression model to a relevant outcome. The Bayesian clustering aspect of the proposed modeling framework has a number of advantages over traditional clustering approaches in that it allows the number of groups to vary, uncovers subgroups and examines their association with an outcome of interest, and fits the model as a unit, allowing an individual's outcome potentially to influence cluster membership. The method is demonstrated with an analysis of survey data obtained from the National Survey of Children's Health. The approach has been implemented using the standard Bayesian modeling software, WinBUGS, with code provided in the supplementary material available at Biostatistics online. Further, interpretation of partitions of the data is helped by a number of postprocessing tools that we have developed.

Từ khóa


Tài liệu tham khảo

American Academy of Pediatrics, 2002, Medical Home Initiatives for children with special needs project advisory committee, The Medical Home, Pediatrics, 110, 184

Dahl, 2006, Model-based clustering for expression data via a Dirichlet process mixture model, Bayesian Inference for Gene Expression and Proteomics, 210, 10.1017/CBO9780511584589.011

DeSantis, 2009, A latent class model with hidden markov dependence for array CGH data, Biometrics, 65, 1296, 10.1111/j.1541-0420.2009.01226.x

DeSantis, 2008, A penalized latent class model for ordinal data, Biostatistics, 9, 249, 10.1093/biostatistics/kxm026

Diebolt, 1994, Estimation of finite mixture distributions through Bayesian sampling, Journal of the Royal Statistical Society, Series B, 56, 363

Escobar, 1995, Bayesian density estimation and inference using mixtures, Journal of the American Statistical Association, 90, 577, 10.1080/01621459.1995.10476550

Forgy, 1965, Cluster analysis of multivariate data: efficiency vs interpretability of classifications, Biometrics, 21, 768

Gelman, 2008, A weakly informative default prior distribution for logistic and other regression models, Annals of Applied Statistics, 2, 1360, 10.1214/08-AOAS191

Gilks, 1996, Markov Chain Monte Carlo in Practice

Green, 2001, Modelling heterogeneity with and without the Dirichlet process, Scandinavian Journal of Statistics, 28, 355, 10.1111/1467-9469.00242

Hartigan, 1979, A k-means clustering algorithm, Applied Statistics, 28, 100, 10.2307/2346830

Ishwaran, 2001, Gibbs sampling methods for stick-breaking priors, Journal of the American Statistical Association, 96, 161, 10.1198/016214501750332758

Jain, 2004, A split-merge Markov chain Monte carlo procedure for the Dirichlet process mixture model, Journal of Computational and Graphical Statistics, 13, 158, 10.1198/1061860043001

Kaufman, 2005, Finding Groups in Data: An Introduction to Cluster Analysis. Wiley Series in Probability and Mathematical Statistics

MacEachern, 1998, Estimating mixture of dirichlet process models, Journal of Computational and Graphical Statistics, 7, 223

MacLehose, 2007, Bayesian methods for highly correlated exposure data, Epidemiology, 18, 199, 10.1097/01.ede.0000256320.30737.c0

Medvedovic, 2002, Bayesian infinite mixture model based clustering of gene expression profiles, Bioinformatics, 18, 1194, 10.1093/bioinformatics/18.9.1194

Müller, 1997, A Bayesian population model with hierarchical mixture priors applied to blood count data, Journal of the American Statistical Association, 92, 1279

Neal, 2000, Markov chain sampling methods for Dirichlet process mixture models, Journal of Computational and Graphical Statistics, 9, 249

Ohlssen, 2007, Flexible random-effects models using Bayesian semi-parametric models: applications to institutional comparisons, Statistics in Medicine, 26, 2088, 10.1002/sim.2666

Patterson, 2002, Latent class analysis of complex sample survey data: application to dietary data, Journal of the American Statistical Association, 97, 721, 10.1198/016214502388618465

R Development Core Team, 2006, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing

Richardson, 1997, On Bayesian analysis of mixtures with an unknown number of components (with discussion), Journal of the Royal Statistical Society, Series B, 59, 731, 10.1111/1467-9868.00095

Spiegelhalter, 2003, WinBUGS User Manual. Version 1.4

Tucker, 2007, Commentary: dietary patterns in transition can inform health risk, but detailed assessments are needed to guide recommendations, International Journal of Epidemiology, 36, 610, 10.1093/ije/dym105

Tutz, 2005, Localized classification, Statistics and Computer, 15, 155, 10.1007/s11222-005-1305-x

van Dam, 2005, New approaches to the study of dietary patterns, British Journal of Nutrition, 93, 573, 10.1079/BJN20051453

Walker, 1999, Bayesian nonparametric inference for random distributions and related functions (with discussion), Journal of the Royal Statistical Society, Series B, 61, 485, 10.1111/1467-9868.00190

Wang, 2006, Invited commentary: beyond frequencies and coefficients—toward meaningful descriptions for life course epidemiology, American Journal of Epidemiology, 164, 122, 10.1093/aje/kwj194

West, 1994, Hierarchical priors and mixture models, with application in regression and density estimation, Aspects of Uncertainty: Attribute to D.V. Lindley, 363

Yeh, 2003, Racial/ethnic differences in parental endorsement of barriers to mental health services for youth, Mental Health Services Research, 5, 65, 10.1023/A:1023286210205