Methods of tagSNP selection and other variables affecting imputation accuracy in swine

BMC Genetics - Tập 14 - Trang 1-14 - 2013
Yvonne M Badke1, Ronald O Bates1, Catherine W Ernst1, Clint Schwab2, Justin Fix3, Curtis P Van Tassell4, Juan P Steibel1,5
1Department of Animal Science, Michigan State University, East Lansing, USA
2The Maschhoffs, Carlyle, USA
3National Swine Registry, West Lafayette, USA
4Bovine Functional Genomics Laboratory, Agricultural Research Service, United States Department of Agriculture, Beltsville, USA
5Department of Fisheries & Wildlife, Michigan State University, East Lansing, USA

Tóm tắt

Genotype imputation is a cost efficient alternative to use of high density genotypes for implementing genomic selection. The objective of this study was to investigate variables affecting imputation accuracy from low density tagSNP (average distance between tagSNP from 100kb to 1Mb) sets in swine, selected using LD information, physical location, or accuracy for genotype imputation. We compared results of imputation accuracy based on several sets of low density tagSNP of varying densities and selected using three different methods. In addition, we assessed the effect of varying size and composition of the reference panel of haplotypes used for imputation. TagSNP density of at least 1 tagSNP per 340kb (∼7000 tagSNP) selected using pairwise LD information was necessary to achieve average imputation accuracy higher than 0.95. A commercial low density (9K) tagSNP set for swine was developed concurrent to this study and an average accuracy of imputation of 0.951 based on these tagSNP was estimated. Construction of a haplotype reference panel was most efficient when these haplotypes were obtained from randomly sampled individuals. Increasing the size of the original reference haplotype panel (128 haplotypes sampled from 32 sire/dam/offspring trios phased in a previous study) led to an overall increase in imputation accuracy (I A = 0.97 with 512 haplotypes), but was especially useful in increasing imputation accuracy of SNP with MAF below 0.1 and for SNP located in the chromosomal extremes (within 5% of chromosome end). The new commercially available 9K tagSNP set can be used to obtain imputed genotypes with high accuracy, even when imputation is based on a comparably small panel of reference haplotypes (128 haplotypes). Average imputation accuracy can be further increased by adding haplotypes to the reference panel. In addition, our results show that randomly sampling individuals to genotype for the construction of a reference haplotype panel is more cost efficient than specifically sampling older animals or trios with no observed loss in imputation accuracy. We expect that the use of imputed genotypes in swine breeding will yield highly accurate predictions of GEBV, based on the observed accuracy and reported results in dairy cattle, where genomic evaluation of some individuals is based on genotypes imputed with the same accuracy as our Yorkshire population.

Tài liệu tham khảo

Boichard D, Chung H, Dassonneville R, David X, Eggen A, Fritz S, Gietzen KJ, Hayes BJ, Lawley CT, Sonstegard TS, Van Tassell CP: Design of a Bovine Low-Density SNP Array Optimized for Imputation. PLoS ONE. 2012, 7 (3): e34130-10.1371/journal.pone.0034130. [http://dx.plos.org/10.1371/journal.pone.0034130] Matukumalli LK, Lawley CT, Schnabel RD, Taylor JF, Allan MF, Heaton MP, O’Connell J, Moore SS, Smith TPL, Sonstegard TS, Tassell CPV: Development and Characterization of a High Density SNP Genotyping Assay for Cattle. PLoS One. 2009, 4 (4): e5350-10.1371/journal.pone.0005350. [http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0005350], VanRaden PM, O’Connell JR, Wiggans GR, Weigel KA: Genomic evaluations with many more genotypes. Genet Sel Evol. 2011, 43: 10-10.1186/1297-9686-43-10. [http://www.ncbi.nlm.nih.gov/pubmed/21366914] Groenen MM, Wahlberg P, Foglio M, Cheng HH, Megens HJ, Crooijmans RPM, Besnier F, Lathrop M, Muir WM, Wong GKS, Gut I, Andersson L: A high-density SNP-based linkage map of the chicken genome reveals sequence features correlated with recombination rate. Genome Res. 2009, 19 (3): 510-519. [http://www.ncbi.nlm.nih.gov/pubmed/19088305] Archibald AL, Cockett NE, Dalrymple BP, Faraut T, Kijas JW, Maddox JF, McEwan JC, Hutton Oddy V, Raadsma HW, Wade C, Wang J, Wang W, Xun X: The sheep genome reference sequence: a work in progress. Animal Genet. 2010, 41 (5): 449-53. [http://www.ncbi.nlm.nih.gov/pubmed/20809919] Ramos AM, Crooijmans RPMA, Affara NA, Amaral AJ, Archibald AL, Beever JE, Bendixen C, Churcher C, Clark R, Dehais P, Hansen MS, Hedegaard J, Hu ZL, Kerstens HH, Law AS, Megens HJ, Milan D, Nonneman DJ, Rohrer GA, Rothschild MF, Smith TPL, Schnabel RD, Tassell CPV, Taylor JF, Wiedmann RT, Schook LB, Groenen MAM: Design of a High Density SNP Genotyping Assay in the Pig Using SNPs Identified and Characterized by Next Generation Sequencing Technology. PLoS One. 2009, 4 (8): e6524-10.1371/journal.pone.0006524. [http://www.ncbi.nlm.nih.gov/pubmed/19654876] Hayes BJ, Bowman PJ, Chamberlain AJ, Goddard ME: Invited review: Genomic selection in dairy cattle: Progress and challenges. J Dairy Sci. 2009, 92 (2): 433-443. 10.3168/jds.2008-1646. [http://www.ncbi.nlm.nih.gov/pubmed/19164653] Meuwissen TH, Hayes BJ, Goddard ME: Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001, 157 (4): 1819-1829. [http://www.genetics.org/content/157/4/1819.abstract] Dassonneville R, Brondum RF, Druet T, Fritz S, Guillaume F, Guldbrandtsen B, Lund MS, Ducrocq V, Su G: Effect of imputing markers from a low-density chip on the reliability of genomic breeding values in Holstein populations. J Dairy Sci. 2011, 94 (7): 3679-3686. 10.3168/jds.2011-4299. [http://www.ncbi.nlm.nih.gov/pubmed/21700057] Weigel KA, de los CamposG, Vazquez AI, Rosa GJM, Gianola D, Tassell CPV: Accuracy of direct genomic values derived from imputed single nucleotide polymorphism genotypes in Jersey cattle. J Dairy Sci. 2010, 93 (11): 5423-5435. 10.3168/jds.2010-3149. [http://www.ncbi.nlm.nih.gov/pubmed/20965358] Berry DP, Kearney JF: Imputation of genotypes from low- to high-density genotyping platforms and implications for genomic selection. Animal. 2011, 5 (08): 1162-1169. 10.1017/S1751731111000309. [http://www.ncbi.nlm.nih.gov/pubmed/22440168] Habier D, Fernando RL, Dekkers JCM: Genomic Selection Using Low-Density Marker Panels. Genetics. 2009, 182: 343-353. 10.1534/genetics.108.100289. [http://www.genetics.org/content/182/1/343.abstract] Browning BL, Browning SR: A Unified Approach to Genotype Imputation and Haplotype-Phase Inference for Large Data Sets of Trios and Unrelated Individuals. Am J Hum Genet. 2009, 84 (2): 210-223. 10.1016/j.ajhg.2009.01.005. [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2668004/] Scheet P, Stephens M: A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet. 2006, 78 (4): 629-644. 10.1086/502802. [http://www.ncbi.nlm.nih.gov/pubmed/16532393] Hickey JM, Crossa J, de los Campos G, Babu R: Factors Affecting the Accuracy of Genotype Imputation in Populations from Several Maize Breeding Programs. Crop Sci. 2012, 52 (2): 654-10.2135/cropsci2011.07.0358. [https://www.agronomy.org/publications/cs/abstracts/52/2/654] Hayes BJ, Bowman PJ, Daetwyler HD, Kijas JW: Accuracy of genotype imputation in sheep breeds. Anim Genet. 2012, 43: 72-80. [http://www.ncbi.nlm.nih.gov/pubmed/22221027] He J, Zelikovsky A: Informative SNP selection methods based on SNP prediction. IEEE Trans Nanobioscience. 2007, 6: 60-67. [http://www.ncbi.nlm.nih.gov/pubmed/17393851] Qin ZS, Gopalakrishnan S, Abecasis GR: An efficient comprehensive search algorithm for tagSNP selection using linkage disequilibrium criteria. Bioinf. 2006, 22 (2): 220-225. 10.1093/bioinformatics/bti762. [http://www.ncbi.nlm.nih.gov/pubmed/16269414] He J, Zelikovsky A: MLR-tagging: informative SNP selection for unphased genotypes based on multiple linear regression. Bioinf. 2006, 22 (20): 2558-2561. 10.1093/bioinformatics/btl420. [http://bioinformatics.oxfordjournals.org/content/22/20/2558.full] Howie B, Marchini J: Genotype Imputation with Thousands of Genomes. G3. 2011, 1 (457): 13-[http://www.g3journal.org/content/1/6/457.full] Huang L, Li Y, Singleton AB, Hardy JA, Abecasis G, Rosenberg NA, Scheet P: Genotype-imputation accuracy across worldwide human populations. Am J Hum Genet. 2009, 84 (2): 235-250. 10.1016/j.ajhg.2009.01.013. [http://www.ncbi.nlm.nih.gov/pubmed/19215730] Marchini J, Cutler D, Patterson N, Stephens M, Eskin E, Halperin E, Lin S, Qin ZS, Munro HM, Abecasis GR, Donnelly P, Consortium IH: A comparison of phasing algorithms for trios and unrelated individuals. Am J Hum Genet. 2006, 78 (3): 437-450. 10.1086/500808. [http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1380287/] Badke YM, Bates RO, Ernst CW, Schwab C, Steibel JP: Estimation of linkage disequilibrium in four US pig breeds. BMC Genomics. 2012, 13: 24-10.1186/1471-2164-13-24. [http://www.biomedcentral.com/1471-2164/13/24] de Roos aPW, Hayes BJ, Goddard ME: Reliability of genomic predictions across multiple populations. Genetics. 2009, 183 (4): 1545-1553. 10.1534/genetics.109.104935. [http://www.genetics.org/content/183/4/1545.abstract] Browning BL: Documentation of BEAGLE 3.3.1. [http://faculty.washington.edu/browning/beagle/beagle.html] Zheng J, Li Y, Abecasis GR, Scheet P: A comparison of approaches to account for uncertainty in analysis of imputed genotypes. Genet Epidemiol. 2011, 35 (2): 102-110. 10.1002/gepi.20552. [http://www.ncbi.nlm.nih.gov/pubmed/21254217] Becker RA, Chambers JM, Wilks AR: The new S language: A programming environment for data analysis and graphics Pacific Grove. 1998, CA: Wadsworth & Brooks/Cole Advanced Books & Software, [http://adsabs.harvard.edu/abs/1988nsl..book.....B] Dassonneville R, Fritz S, Ducrocq V, Boichard D: Short communication: Imputation performances of 3 low-density marker panels in beef and dairy cattle. Journal of dairy science. 2012, 95 (7): 4136-40. 10.3168/jds.2011-5133. [http://www.ncbi.nlm.nih.gov/pubmed/22720970] Cleveland WS, Grosse E, Shyu WM: Local regression models. Edited by: Chambers JM, Hastie TJ. 1992, Pacific Grove: Wadsworth & Brooks/Cole, 309-376. Steibel J, Wysocki M, Lunney J: Assessment of the swine protein-annotated oligonucleotide microarray. Anim Genet. 2009, 40 (6): 883-893. 10.1111/j.1365-2052.2009.01928.x. http://www.ncbi.nlm.nih.gov/pubmed/19515086] Abecasis GR, Cherny SS, Cookson WO, Cardon LR: Merlin–rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet. 2002, 30: 97-101. 10.1038/ng786. [http://www.ncbi.nlm.nih.gov/pubmed/11731797] Hickey JM, Kinghorn BP, Tier B, Wilson JF, Dunstan N, Van Der Werf JH: A combined long-range phasing and long haplotype imputation method to impute phase for SNP genotypes. Genet Sel Evol GSE. 2011, 43: 12-10.1186/1297-9686-43-12. [http://www.ncbi.nlm.nih.gov/pubmed/21388557] Weigel KA, Tassell CPV, O’Connell JR, VanRaden PM, Wiggans GR: Prediction of unobserved single nucleotide polymorphism genotypes of Jersey cattle using reference panels and population-based imputation algorithms. J Dairy Sci. 2010, 93 (5): 2229-2238. 10.3168/jds.2009-2849. [http://www.ncbi.nlm.nih.gov/pubmed/20412938] Vittinghoff E, Shiboski SC, Glidden DV, McGulloch CE: Regression Methods in Biostatistics : Linear, Logistic, Survival, and Repeated Measures Models. 2005, New York: Springer Gualdron Duarte JL, Bates RO, Ernst CW, Raney NE, Cantet RJC, Steibel JP: Genotype imputation accuracy in an F2 pig cross using high density and low density SNP panels [abstract]. 2012, Phoenix, W76-W76. Huang Y, Hickey JM, Cleveland MA, Maltecca C: Assessment of alternative genotyping strategies to maximize imputation accuracy at minimal cost. Genet, Sel, Evol : GSE. 2012, 44: 25-10.1186/1297-9686-44-25. [http://www.gsejournal.org/content/44/1/25] Browning SR, Browning BL: Haplotype phasing: existing methods and new developments. Nat Rev Genet. 2011, 12 (10): 703-714. 10.1038/nrg3054. [http://www.ncbi.nlm.nih.gov/pubmed/21921926] Welsh CS, Stewart TS, Schwab C, Blackburn HD: Pedigree analysis of 5 swine breeds in the United States and the implications for genetic conservation. J Anim Sci. 2010, 88 (5): 1610-1618. 10.2527/jas.2009-2537. [http://www.ncbi.nlm.nih.gov/pubmed/20190174] Zhang Z, Druet T: Marker imputation with low-density marker panels in Dutch Holstein cattle. J Dairy Sci. 2010, 93 (11): 5487-5494. 10.3168/jds.2010-3501. [http://www.ncbi.nlm.nih.gov/pubmed/20965364] Wiggans GR, Cooper TA, VanRaden PM, Olson KM, Tooker ME: Use of the Illumina Bovine3K BeadChip in dairy genomic evaluation. J Dairy Sci. 2012, 95 (3): 1552-1558. 10.3168/jds.2011-4985. [http://www.ncbi.nlm.nih.gov/pubmed/22365235]