Needles: Toward Large-Scale Genomic Prediction with Marker-by-Environment Interaction

Genetics - Tập 203 Số 1 - Trang 543-555 - 2016
Arne De Coninck1,2, Bernard De Baets1,2, Drosos Kourounis3, Fabio Verbosio3, Olaf Schenk3, Steven Maenhout4, Jan Fostier1,5
1Bioinformatics Institute Ghent , Ghent University, B-9000 Ghent, Belgium
2KERMIT , Department of Mathematical Modelling, Statistics and Bioinformatics, Ghent University, B-9000 Ghent, Belgium
3Institute of Computational Science , Università della Svizzera italiana, CH-6904 Lugano, Switzerland
4Progeno , B-9052 Zwijnaarde, Belgium
5Department of Information Technology (INTEC) , Ghent University–iMinds, B-9000 Ghent, Belgium

Tóm tắt

Abstract Genomic prediction relies on genotypic marker information to predict the agronomic performance of future hybrid breeds based on trial records. Because the effect of markers may vary substantially under the influence of different environmental conditions, marker-by-environment interaction effects have to be taken into account. However, this may lead to a dramatic increase in the computational resources needed for analyzing large-scale trial data. A high-performance computing solution, called Needles, is presented for handling such data sets. Needles is tailored to the particular properties of the underlying algebraic framework by exploiting a sparse matrix formalism where suited and by utilizing distributed computing techniques to enable the use of a dedicated computing cluster. It is demonstrated that large-scale analyses can be performed within reasonable time frames with this framework. Moreover, by analyzing simulated trial data, it is shown that the effects of markers with a high environmental interaction can be predicted more accurately when more records per environment are available in the training data. The availability of such data and their analysis with Needles also may lead to the discovery of highly contributing QTL in specific environmental conditions. Such a framework thus opens the path for plant breeders to select crops based on these QTL, resulting in hybrid lines with optimized agronomic performance in specific environmental conditions.

Từ khóa


Tài liệu tham khảo

Aguilar, 2010, Hot topic: a unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of holstein final score., J. Dairy Sci., 93, 743, 10.3168/jds.2009-2730

Bernardo, 2010, Breeding for Quantitative Traits in Plants

Bernardo, 2007, Prospects for genomewide selection for quantitative traits in maize., Crop Sci., 47, 1082, 10.2135/cropsci2006.11.0690

Blackford, 1997, ScaLAPACK Users’ Guide, 10.1137/1.9780898719642

Boer, 2007, A mixed-model quantitative trait loci (QTL) analysis for multiple-environment trial data using environmental covariables for QTL-by-environment interactions, with an example in maize., Genetics, 177, 1801, 10.1534/genetics.107.071068

Burgueño, 2012, Genomic prediction of breeding values when modeling genotype × environment interaction using pedigree and dense molecular markers., Crop Sci., 52, 707, 10.2135/cropsci2011.06.0299

Chen, 2009, Fast and flexible simulation of DNA sequence data., Genome Res., 19, 136, 10.1101/gr.083634.108

Choi, 1996, 107

Christensen, 2010, Genomic prediction when some animals are not genotyped., Genet. Sel. Evol., 42, 1, 10.1186/1297-9686-42-2

Cooper, 2005, Gene-to-phenotype models and complex trait genetics., Crop Pasture Sci., 56, 895, 10.1071/AR05154

Cooper, 2002, The GP problem: quantifying gene-to-phenotype relationships., In Silico Biol., 2, 151

Crossa, 2010, Prediction of genetic values of quantitative traits in plant breeding using pedigree and molecular markers., Genetics, 186, 713, 10.1534/genetics.110.118521

Crossa, 2014, Genomic prediction in CIMMYT maize and wheat breeding programs., Heredity, 112, 48, 10.1038/hdy.2013.16

De Coninck, 2014, DAIRRy-BLUP: a high-performance computing approach to genomic prediction., Genetics, 197, 813, 10.1534/genetics.114.163683

de los Campos, 2013, Whole-genome regression and prediction methods applied to plant and animal breeding., Genetics, 193, 327, 10.1534/genetics.112.143313

Denis, 1997, Modelling expectation and variance for genotype by environment data., Heredity, 79, 162, 10.1038/hdy.1997.139

Federer, 1975, On augmented designs., Biometrics, 31, 29, 10.2307/2529707

Friedman, 2001, The Elements of Statistical Learning

Ganal, 2012, Large SNP arrays for genotyping in crop plants., J. Biosci., 37, 821, 10.1007/s12038-012-9225-3

Gilmour, 1995, Average information REML: an efficient algorithm for variance parameter estimation in linear mixed models., Biometrics, 51, 1440, 10.2307/2533274

Habier, 2007, The impact of genetic relationship information on genome-assisted breeding values., Genetics, 177, 2389, 10.1534/genetics.107.081190

Hartigan, 1979, Algorithm AS 136: a k-means clustering algorithm., J. R. Stat. Soc. Ser. C Appl. Stat., 28, 100

Hayes, 2009, Invited review. Genomic selection in dairy cattle: progress and challenges., J. Dairy Sci., 92, 433, 10.3168/jds.2008-1646

Henderson, 1963, Selection index and expected genetic advance, Statistical Genetics and Plant Breeding, 141

Henderson, 1973, Sire evaluation and genetic trends., J. Anim. Sci., 1973, 10, 10.1093/ansci/1973.Symposium.10

Heslot, 2012, Genomic selection in plant breeding: a comparison of models., Crop Sci., 52, 146, 10.2135/cropsci2011.06.0297

Heslot, 2014, Integrating environmental covariates and crop modeling into the genomic selection framework to predict genotype by environment interactions., Theor. Appl. Genet., 127, 463, 10.1007/s00122-013-2231-5

Hickey, 2014, AlphaMPSim: flexible simulation of multi-parent crosses., Bioinformatics, 30, 2686, 10.1093/bioinformatics/btu206

Jarquín, 2014, A reaction norm model for genomic selection using high-dimensional genomic and environmental data., Theor. Appl. Genet., 127, 595, 10.1007/s00122-013-2243-1

König, 2005, Genetic relationships for dairy performance between large-scale and small-scale farm conditions., J. Dairy Sci., 88, 4087, 10.3168/jds.S0022-0302(05)73093-9

Kuzmin, 2013, 533

Lande, 1990, Efficiency of marker-assisted selection in the improvement of quantitative traits., Genetics, 124, 743, 10.1093/genetics/124.3.743

Lopez-Cruz, 2015, Increased prediction accuracy in wheat breeding trials using a marker × environment interaction genomic selection model., G3, 5, 569, 10.1534/g3.114.016097

Meuwissen, 2001, Prediction of total genetic value using genome-wide dense marker maps., Genetics, 157, 1819, 10.1093/genetics/157.4.1819

Moreau, 2004, Use of trial clustering to study QTL × environment effects for grain yield and related traits in maize., Theor. Appl. Genet., 110, 92, 10.1007/s00122-004-1781-y

Mulder, 2005, Effects of genotype × environment interaction on genetic gain in breeding programs., J. Anim. Sci., 83, 49, 10.2527/2005.83149x

Patterson, 1971, Recovery of inter-block information when block sizes are unequal., Biometrika, 58, 545, 10.1093/biomet/58.3.545

Piepho, 1998, Empirical best linear unbiased prediction in cultivar trials using factor-analytic variance-covariance structures., Theor. Appl. Genet., 97, 195, 10.1007/s001220050885

Piepho, 2005, Statistical tests for QTL and QTL-by-environment effects in segregating populations derived from line crosses., Theor. Appl. Genet., 110, 561, 10.1007/s00122-004-1872-9

Podlich, 2004, Mapping as you go: an effective approach for marker-assisted selection of complex traits., Crop Sci., 44, 1560, 10.2135/cropsci2004.1560

Schenk, 2007, Matching-based preprocessing algorithms to the solution of saddle-point problems in large-scale nonconvex interior-point optimization., Comput. Optim. Appl., 36, 321, 10.1007/s10589-006-9003-y

Schenk, 2008, On large-scale diagonalization techniques for the anderson model of localization., SIAM Rev., 50, 91, 10.1137/070707002

Schön, 2004, Quantitative trait locus mapping based on resampling in a vast maize testcross experiment and its relevance to quantitative genetics for complex traits., Genetics, 167, 485, 10.1534/genetics.167.1.485

Schulz-Streeck, 2011, Pre-selection of markers for genomic selection., BMC Proc., 5, S12, 10.1186/1753-6561-5-S3-S12

Schulz-Streeck, 2013, Genomic selection allowing for marker-by-environment interaction., Plant Breed., 132, 532, 10.1111/pbr.12105

Schulz-Streeck, 2013, Comparisons of single-stage and two-stage approaches to genomic selection., Theor. Appl. Genet., 126, 69, 10.1007/s00122-012-1960-1

Shindo, 2003, Segregation analysis of heading traits in hexaploid wheat utilizing recombinant inbred lines., Heredity, 90, 56, 10.1038/sj.hdy.6800178

Snir, 1998, MPI: The Complete Reference, Ed. 2

Takahashi, 1973, 63

van Eeuwijk, 2005, Statistical models for genotype by environment data: from conventional ANOVA models to eco-physiological QTL models., Crop Pasture Sci., 56, 883, 10.1071/AR05153

VanRaden, 2009, Invited review: reliability of genomic predictions for North American Holstein bulls., J. Dairy Sci., 92, 16, 10.3168/jds.2008-1514