Estimation of genomic breeding values using the Horseshoe prior

BMC Proceedings - Tập 8 - Trang 1-6 - 2014
Ricardo Pong-Wong1
1The Roslin Institute and the R(D)SVS, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK

Tóm tắt

A method for estimating genomic breeding values (GEBV) based on the Horseshoe prior was introduced and used on the analysis of the 16th QTLMAS workshop dataset, which resembles three milk production traits. The method was compared with five commonly used methods: Bayes A, Bayes B, Bayes C, Bayesian Lasso and GLUP. The main difference between the methods is the prior distribution assumed during the estimation of the SNP effects. The distribution of the Bayesian Lasso is a Laplace distribution; for Bayes A is a Student-t; for Bayes B and Bayes C is a spike and slab prior combining a proportion of SNP without effect and a proportion with effect distributed as a Student-t or Gaussian for Bayes B and C, respectively; for GBLUP is similar to a ridge regression. The distribution for the Horseshoe prior behaves like log(1+1/β2) (up to a constant). It has an infinite spike at zero and heavy tail that decay by β-2 (slower than the Laplace or the Student-t). The implementation of all methods (except GBLUP) was done using a MCMC approach, where the relevant parameters defining the prior distributions were jointly estimated from the data. The GBLUP was done using ASREML. The accuracy for all methods ranged from 0.74 to 0.83, representing an improvement of 44% to 78% over the traditional BLUP evaluation. GEBV with the highest accuracy were obtained with Bayes A, Bayes B and the Horseshoe prior. The Horseshoe tended to select smaller number of SNP and assigning them larger effects, while strongly shrinking the remaining SNP to have an effect closer to zero. The Horseshoe prior showed a different shrinkage pattern than the other methods. While for this specific dataset, this has little impact on the accuracy of the GEBV, it may prove a good property to discriminate true effect from noise, and thereby, improve overall prediction under different scenarios.

Tài liệu tham khảo

Meuwissen THE, Hayes BJ, Goddard ME: Prediction of total genetic value using genome-wide dense marker maps. enetics. 2001, 157: 1819-1829. Nadaf J, Riggio V, Yu T-P, Pong-Wong R: Effect of the prior distribution of SNP effects on the estimation of total breeding value. BMC Proceedings. 2012, 6 (Suppl 2): S6-10.1186/1753-6561-6-S2-S6. Gianola D, de los Campos G, Hill WG, Manfredi E, Fernando R: Additive Genetic Variability and the Bayesian Alphabet. Genetics. 2009, 183: 347-363. 10.1534/genetics.109.103952. Carvalho CM, Polson NG, Scott JG: The horseshoe estimator for sparse signals. Biometrika. 2010, 97: 465-480. 10.1093/biomet/asq017. Carvalho CM, Polson NG, Scott JG: Handling Sparsity via the Horseshoe. Journal of Machine Learning Research - Proceedings Track. 2009, 5: 73-80. Park T, Casella G: The Bayesian Lasso. Journal of the American Statistical Association. 2008, 103: 681-686. 10.1198/016214508000000337. Mitchell TJ, Beauchamp JJ: Bayesian Variable Selection in Linear Regression. Journal of the American Statistical Association. 1988, 83: 1023-1032. 10.1080/01621459.1988.10478694. Andrews DF, Mallows CL: Scale mixtures of normal distributions. Journal of the Royal Statistical Society Series B-Methodological. 1974, 36: 99-102. Shen X, Alam M, Fikse F, Ronnegard L: A Novel Generalized Ridge Regression Method for Quantitative Genetics. Genetics. 2013, 193: 1255-1268. 10.1534/genetics.112.146720. Scott JG: Bayesian Estimation of Intensity Surfaces on the Sphere via Needlet Shrinkage and Selection. Bayesian Analysis. 2011, 6: 307-327. 10.1214/11-BA611. Butler D CB, Gilmour AR, Gogel BJ: ASReml-R reference manual. Book ASReml-R reference manual. 2007, City: Brisbane: Queensland Department of Primary Industries and Fisheries Yi NJ, Xu SH: Bayesian LASSO for quantitative trait loci mapping. Genetics. 2008, 179: 1045-1055. 10.1534/genetics.107.085589. Damien P, Wakefield J, Walker S: Gibbs sampling for Bayesian non-conjugate and hierarchical models by using auxiliary variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 1999, 61: 331-344. 10.1111/1467-9868.00179.