High-Performance Mixed Models Based Genome-Wide Association Analysis with omicABEL software

F1000Research - Tập 3 - Trang 200
Diego Fabregat‐Traver1, Sodbo Sharapov2, Caroline Hayward3, Igor Rudan4, Harry Campbell5, Yurii S. Aulchenko6, Paolo Bientinesi1
1Aachen Institute for Advanced Study in Computational Engineering Science, Aachen, 52062, Germany.
2Institute of Cytology and Genetics, Siberian Division of the Russian Academy of Sciences, Novosibirsk, 630090, Russian Federation ; Novosibirsk State University, Novosibirsk, 630090, Russian Federation.
3MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, UK
4Centre for Population Health Sciences, University of Edinburgh, Edinburgh, EH8 9AG, UK ; Split University, Split, 21000, Croatia.
5Centre for Population Health Sciences, University of Edinburgh, Edinburgh, EH8 9AG, UK
6Institute of Cytology and Genetics, Siberian Division of the Russian Academy of Sciences, Novosibirsk, 630090, Russian Federation ; Novosibirsk State University, Novosibirsk, 630090, Russian Federation ; Centre for Population Health Sciences, University of Edinburgh, Edinburgh, EH8 9AG, UK.

Tóm tắt

To raise the power of genome-wide association studies (GWAS) and avoid false-positive results in structured populations, one can rely on mixed model based tests. When large samples are used, and when multiple traits are to be studied in the ’omics’ context, this approach becomes computationally challenging. Here we consider the problem of mixed-model based GWAS for arbitrary number of traits, and demonstrate that for the analysis of single-trait and multiple-trait scenarios different computational algorithms are optimal. We implement these optimal algorithms in a high-performance computing framework that uses state-of-the-art linear algebra kernels, incorporates optimizations, and avoids redundant computations,increasing throughput while reducing memory usage and energy consumption. We show that, compared to existing libraries, our algorithms and software achieve considerable speed-ups. The OmicABEL software described in this manuscript is available under the GNUGPL v. 3 license as part of the GenABEL project for statistical genomics at http: //www.genabel.org/packages/OmicABEL.

Từ khóa


Tài liệu tham khảo

L Flintoft, 2009, Human epigenomics: Putting epigenetic variation on the map., Nat Rev Genet., 10, 663-663, 10.1038/nrg2676

D de Koning, 2005, Genetical genomics in humans and model organisms., Trends Genet., 21, 377-381, 10.1016/j.tig.2005.05.004

Z Wang, 2009, RNA-Seq: a revolutionary tool for transcriptomics., Nat Rev Genet., 10, 57-63, 10.1038/nrg2484

J Nicholson, 1999, ‘Metabonomics’: understanding the metabolic responses of living systems to pathophysiological stimuli via multivariate statistical analysis of biological nmr spectroscopic data., Xenobiotica., 29, 1181-1189, 10.1080/004982599238047

L Raamsdonk, 2001, A functional genomics strategy that uses metabolome data to reveal the phenotype of silent mutations., Nat Biotechnol., 19, 45-50, 10.1038/83496

G Lauc, 2010, Genomics meets glycomics: the first GWAS study of human N-glycome identifies HNF1α as a master regulator of plasma protein fucosylation., PLoS Genet., 6, e1001256, 10.1371/journal.pgen.1001256

G Lauc, 2013, Loci associated with N-glycosylation of human immunoglobulin g show pleiotropy with autoimmune diseases and haematological cancers., PLoS Genet., 9, e1003225, 10.1371/journal.pgen.1003225

A Altelaar, 2013, Next-generation proteomics: towards an integrative view of proteome dynamics., Nat Rev Genet., 14, 35-48, 10.1038/nrg3356

L Hindorff, 2009, Potential etiologic and functional implications of genome-wide association loci for human diseases and traits., Proc Natl Acad Sci U S A., 106, 9362-9367, 10.1073/pnas.0903103106

C Gieger, 2008, Genetics meets metabolomics: a genome-wide association study of metabolite profiles in human serum., PLoS Genet., 4, e1000282, 10.1371/journal.pgen.1000282

A Hicks, 2009, Genetic determinants of circulating sphingolipid concentrations in European populations., PLoS Genet., 5, e1000672, 10.1371/journal.pgen.1000672

A Demirkan, 2012, Genome-wide association study identifies novel loci associated with circulating phospho- and sphingolipid concentrations., PLoS Genet., 8, e1002490, 10.1371/journal.pgen.1002490

J Fu, 2012, Unraveling the regulatory mechanisms underlying tissue-dependent genetic variation of gene expression., PLoS Genet., 8, e1002431, 10.1371/journal.pgen.1002431

W Cookson, 2009, Mapping complex disease traits with global gene expression., Nat Rev Genet., 10, 184-194, 10.1038/nrg2537

H Westra, 2013, Systematic identification of trans eQTLs as putative drivers of known disease associations., Nat Genet., 45, 1238-1243, 10.1038/ng.2756

G Thanabalasingham, 2013, Mutations in HNF1A result in marked alterations of plasma glycan profile., Diabetes., 62, 1329-1337, 10.2337/db12-0880

J Yu, 2006, A unified mixed-model method for association mapping that accounts for multiple levels of relatedness., Nat Genet., 38, 203-208, 10.1038/ng1702

W Astle, 2009, Population structure and cryptic relatedness in genetic association studies., Statist Sci., 24, 451-471, 10.1214/09-STS307

H Kang, 2010, Variance component model to account for sample structure in genome-wide association studies., Nat Genet., 42, 348-354, 10.1038/ng.548

Z Zhang, 2010, Mixed linear model approach adapted for genome-wide association studies., Nat Genet., 42, 355-360, 10.1038/ng.546

Y Aulchenko, 2010, ProbABEL package for genome-wide association analysis of imputed data., BMC Bioinformatics., 11, 134, 10.1186/1471-2105-11-134

C Lippert, 2011, FaST linear mixed models for genome-wide association studies., Nat Methods., 8, 833-835, 10.1038/nmeth.1681

X Zhou, 2012, Genome-wide efficient mixed-model analysis for association studies., Nat Genet., 44, 821-824, 10.1038/ng.2310

V Segura, 2012, An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations., Nat Genet., 44, 825-830, 10.1038/ng.2314

G Svishcheva, 2012, Rapid variance componentsbased method for whole-genome association analysis., Nat Genet., 44, 1166-1170, 10.1038/ng.2410

D Fabregat-Traver, 2013, Applicationtailored linear algebra algorithms: A search-based approach., Int J High Perform Comput Appl., 27, 425-438, 10.1177/1094342013494428

D Fabregat-Traver, 2014, Solving sequences of generalized least-squares problems on multi-threaded architectures., Appl Math Comput., 234, 606-617, 10.1016/j.amc.2014.02.056

W Astle, 2009, Population structure and cryptic relatedness in genetic association studies, Statist Sci., 24, 451-471, 10.1214/09-STS307

Y Aulchenko, 2007, GenABEL: an R library for genome-wide association analysis., Bioinformatics., 23, 1294-6, 10.1093/bioinformatics/btm108

K Suhre, 2011, Human metabolic individuality in biomedical and pharmaceutical research., Nature., 477, 54-60, 10.1038/nature10354

H Goring, 2007, Discovery of expression QTLs using large-scale transcriptional profiling in human lymphocytes., Nat Genet., 39, 1208-1216, 10.1038/ng2119

J Lonsdale, 2013, The Genotype-Tissue Expression (GTEx) project., Nat Genet., 45, 580-585, 10.1038/ng.2653

A Price, 2006, Principal components analysis corrects for stratification in genome-wide association studies., Nat Genet., 38, 904-909, 10.1038/ng1847

A Shabalin, 2012, Matrix eQTL: ultra fast eQTL analysis via large matrix operations., Bioinformatics., 28, 1353-1358, 10.1093/bioinformatics/bts163

K Conneely, 2007, So many correlated tests, so little time! Rapid adjustment of P values for multiple correlated tests., Am J Hum Genet., 81, 1158-1168, 10.1086/522036

M Li, 2011, GATES: a rapid and powerful gene-based association test using extended Simes procedure., Am J Hum Genet., 88, 283-293, 10.1016/j.ajhg.2011.01.019

S van der Sluis, 2013, TATES: Efficient multivariate genotype-phenotype analysis for genome-wide association studies., PLoS Genet., 9, e1003235, 10.1371/journal.pgen.1003235

N Fusi, 2012, Joint modelling of confounding factors and prominent genetic regulators provides increased accuracy in genetical genomics studies., PLoS Comput Biol., 8, e1002330, 10.1371/journal.pcbi.1002330

N Fusi, 2013, Detecting regulatory gene-environment interactions with unmeasured environmental factors., Bioinformatics., 29, 1382-1389, 10.1093/bioinformatics/btt148

X Shen, 2013, A novel generalized ridge regression method for quantitative genetics., Genetics., 193, 1255-1268, 10.1534/genetics.112.146720

D Fabregat-TRaver, 2014, OmicABEL software for genome-wide association studies., Zenodo., 10.5281/zenodo.1099941

W Chen, 2007, Family-based association tests for genomewide association scans., Am J Hum Genet., 81, 913-926, 10.1086/521580

J Dongarra, 1990, A set of level 3 basic linear algebra subprograms., ACM Trans Math Softw., 16, 1-17, 10.1145/77626.79170

E Anderson, 1999, LAPACK Users’ Guide, 10.1137/1.9780898719604

V Vitart, 2006, 3000 years of solitude: extreme differentiation in the island isolates of Dalmatia, Croatia., Eur J Hum Genet., 14, 478-487, 10.1038/sj.ejhg.5201589

I Rudan, 2009, “10001 dalmatians:” Croatia launches its national biobank., Croat Med J., 50, 4-6, 10.3325/cmj.2009.50.4