Genomic Prediction Models for Count Data

Journal of Agricultural, Biological and Environmental Statistics - Tập 20 - Trang 533-554 - 2015

Osval A. Montesinos-López¹, Abelardo Montesinos-López², Paulino Pérez-Rodríguez³, Kent Eskridge⁴, Xinyao He⁵, Philomin Juliana⁶, Pawan Singh⁵, José Crossa⁷

¹Biometrics and Statistics Unit of the International Maize and Wheat Improvement Center (CIMMYT), México, México

²Departamento de Estadística, Centro de Investigación en Matemáticas (CIMAT), Guanajuato, Guanajuato, México

³Colegio de Postgraduados, Montecillos, México

⁴Deparment of Statistics at the University of Nebraska, Lincoln, USA

⁵Global Wheat Breeding Program of CIMMYT, México, México

⁶Plant Breeding & Genetics, Cornell University, Ithaca, USA

⁷Biometrics and Statistics Unit of CIMMYT, Texcoco, México

Tóm tắt

Whole genome prediction models are useful tools for breeders when selecting candidate individuals early in life for rapid genetic gains. However, most prediction models developed so far assume that the response variable is continuous and that its empirical distribution can be approximated by a Gaussian model. A few models have been developed for ordered categorical phenotypes, but there is a lack of genomic prediction models for count data. There are well-established regression models for count data that cannot be used for genomic-enabled prediction because they were developed for a large sample size (n) and a small number of parameters (p); however, the rule in genomic-enabled prediction is that p is much larger than the sample size n. Here we propose a Bayesian mixed negative binomial (BMNB) regression model for counts, and we present the conditional distributions necessary to efficiently implement a Gibbs sampler. The proposed Bayesian inference can be implemented routinely. We evaluated the proposed BMNB model together with a Poisson model, a Normal model with untransformed response, and a Normal model with transformed response using a logarithm, and applied them to two real wheat datasets from the International Maize and Wheat Improvement Center. Based on the criteria used for assessing genomic prediction accuracy, results indicated that the BMNB model is a viable alternative for analyzing count data.

Tài liệu tham khảo

Albert, J. H., & Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88(422): 669-679. Boone, E. L., Stewart-Koster, B., & Kennard, M. J. (2012). A hierarchical zero-inflated Poisson regression model for stream fish distribution and abundance. Environmetrics, 23(3), 207-218. de los Campos, G., Gianola, D., & Allison, D. B. (2010). Predicting genetic predisposition in humans: the promise of whole-genome markers. Nat Rev Genet, 11: 880-886. doi:10.1038/nrg2898. de los Campos, G., Vazquez, A. I., Fernando, R., Klimentidis, Y. C., & Sorensen, D. (2013a). Prediction of Complex Human Traits Using the Genomic Best Linear Unbiased Predictor. PLoS Genetics 9 (7) e1003608. de los Campos, G., Hickey, J. M., Pong-Wong, R., Daetwyler, H. D., & Calus, M. P. L. (2013b). Whole Genome Regression and Prediction Methods Applied to Plant and Animal Breeding. Genetics, 193(2), 327-345. Gelfand, A. E. (1996). Model determination using sampling-based methods. In: Gilks, W. R., Richardson, S., & Spiegelhalter, D. J., editors. Markov Chain Monte Carlo in practice. London: Chapman & Hall. Pp. 145-60. Gelfand, A. E., & Smith, A. F. (1990). Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association, 85(410), 398-409. Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (2004). Bayesian data analysis. 2. Boca Raton: Chapman & Hall. Geyer, C. J. (1992). Practical Markov Chain Monte Carlo. Statistical Science, 473-483. Goddard, M. E., & Hayes, B. J. (2009). Mapping genes for complex traits in domestic animals and their use in breeding programmes. Nat Rev Genet, 10: 381-391. doi:10.1038/nrg2575. Kärkkäinen, H. P., & Sillanpää, M. J. (2012). Back to basics for Bayesian model building in genomic selection. Genetics, 191(3), 969-987. Kizilkaya, K., Fernando, R. L., & Garrick, D. J. (2014). Reduction in accuracy of genomic prediction for ordered categorical data compared to continuous observations. Genetics Selection Evolution, 46:37 doi:10.1186/1297-9686-46-37. Laud, P. W., & Ibrahim, J. G. (1995). Predictive Model Selection. Journal of the Royal Statistical Society, B 57, pp. 247-262. Link, W. A., & Eaton, M. J. (2012). On thinning of chains in MCMC. Methods in Ecology and Evolution, 3(1), 112-115. MacEachern, S. N., & Berliner, L. M. (1994). Subsampling the Gibbs sampler. The American Statistician, 48(3), 188-190. Montesinos-López, O. A., Montesinos-López, A., Pérez-Rodríguez, P., de los Campos, G., Eskridge, K. M., & Crossa, J. (2015). Threshold models for genome-enabled prediction of ordinal categorical traits in plant breeding. G3: Genes| Genomes| Genetics, 5(1), 1-10. Park, T., & van Dyk, D. A. (2009). Partially collapsed Gibbs samplers: Illustrations and applications. Journal of Computational and Graphical Statistics, 18(2), 283-305. Polson, N. G., Scott, J. G., & Windle, J. (2013). Bayesian inference for logistic models using Pólya-Gamma latent variables. Journal of the American Statistical Association, 108(504), 1339-1349. Poland, J.A., Brown, P.J., Sorrells, M.E., Jannink J.-L. 2012. Development of high-density genetic maps for barley and wheat using a novel two-enzyme genotyping-by-sequencing approach. PloS ONE, 7:e32253. Quenouille, M. H. (1949). A relation between the logarithmic, Poisson, and negative binomial series. Biometrics, 5(2), 162-164. R Core Team. (2015). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org/. Riedelsheimer, C., Czedik-Eysenberg, A., Grieder, C., Lisec, J., Technow, F., et al. (2012). Genomic and metabolic prediction of complex heterotic traits in hybrid maize. Nat Genet 44: 217-220. doi:10.1038/ng.1033. Scott, J., & Pillow, J. W. (2013). Fully Bayesian inference for neural models with negative-binomial spiking. In Advances in neural information processing systems, pp. 1898-1906. Spiegelhalter, D. J., Mejor, N. G., Carlin, B. P., & van der Linde, A. (2002). Bayesian Measures of Model Complexity and Fit. Journal of the Royal Statistical Society, B 64, pp. 583-639. Stroup, W. W. (2015). Rethinking the Analysis of Non-Normal Data in Plant and Soil Science. Agronomy Journal, 107(2): 811-827. VanRaden, P. M. (2007). Genomic measures of relationship and inbreeding. Interbull Bull 37: 33-36. ——– (2008). Efficient methods to compute genomic predictions. J. Dairy Sci. 91: 4414-4423. Windle, J., Carvalho, C. M., Scott, J. G., & Sun, L. (2013). Pólya–Gamma Data Augmentation for Dynamic Models. arXiv preprint arXiv:1308.0774. Zhang, Z., Ober, U., Erbe, M., Zhang, H., Gao, N., He, J., & Simianer, H. (2014). Improving the accuracy of whole genome prediction for complex traits using the results of genome-wide association studies. PloS One, 9(3), e93017. Zhou, M., & Carin, L. (2015). Negative binomial process count and mixture modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(2), 307-320. Zhou, M., Li, L., Dunson, D., & Carin, L. (2012). Lognormal and gamma mixed negative binomial regression. In Machine Learning: Proceedings of the International Conference on Machine Learning (vol. 2012, p. 1343). NIH Public Access.

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA