PRSice-2: Polygenic Risk Score software for biobank-scale data

Oxford University Press (OUP) - Tập 8 Số 7 - 2019
Shing Wan Choi1,2, Paul F. O’Reilly1,2
1Department of Genetics and Genomic Sciences, Icahn School of Medicine, Mount Sinai, 1 Gustave L. Levy Pl, New York City, NY 10029, USA
2MRC Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King's College London, De Crespigny Park, Denmark Hill, London, UK, SE5 8AF

Tóm tắt

Abstract Background

Polygenic risk score (PRS) analyses have become an integral part of biomedical research, exploited to gain insights into shared aetiology among traits, to control for genomic profile in experimental studies, and to strengthen causal inference, among a range of applications. Substantial efforts are now devoted to biobank projects to collect large genetic and phenotypic data, providing unprecedented opportunity for genetic discovery and applications. To process the large-scale data provided by such biobank resources, highly efficient and scalable methods and software are required.

Results

Here we introduce PRSice-2, an efficient and scalable software program for automating and simplifying PRS analyses on large-scale data. PRSice-2 handles both genotyped and imputed data, provides empirical association P-values free from inflation due to overfitting, supports different inheritance models, and can evaluate multiple continuous and binary target traits simultaneously. We demonstrate that PRSice-2 is dramatically faster and more memory-efficient than PRSice-1 and alternative PRS software, LDpred and lassosum, while having comparable predictive power.

Conclusion

PRSice-2's combination of efficiency and power will be increasingly important as data sizes grow and as the applications of PRS become more sophisticated, e.g., when incorporated into high-dimensional or gene set–based analyses. PRSice-2 is written in C++, with an R script for plotting, and is freely available for download from http://PRSice.info.

Từ khóa


Tài liệu tham khảo

Mavaddat, 2015, Prediction of breast cancer risk based on profiling with common genetic variants, J Natl Cancer Inst, 107, 10.1093/jnci/djv036

Kuchenbaecker, 2017, Evaluation of polygenic risk scores for breast and ovarian cancer risk prediction in BRCA1 and BRCA2 mutation carriers, J Natl Cancer Inst, 109, 10.1093/jnci/djw302

Natarajan, 2017, Polygenic risk score identifies subgroup with higher burden of atherosclerosis and greater relative benefit from statin therapy in the primary prevention setting, Circulation, 135, 2091, 10.1161/CIRCULATIONAHA.116.024436

Udler, 2018, Clustering of type 2 diabetes genetic loci by multi-trait associations identifies disease mechanisms and subtypes, bioRxiv, 10.1101/319509

Krapohl, 2016, Phenome-wide analysis of genome-wide polygenic scores, Mol Psychiatry, 21, 1188, 10.1038/mp.2015.126

Krapohl, 2018, Multi-polygenic score approach to trait prediction, Mol Psychiatry, 23, 1368, 10.1038/mp.2017.163

Selzam, 2017, Predicting educational achievement from DNA, Mol Psychiatry, 22, 267, 10.1038/mp.2016.107

Selzam, 2017, Genome-wide polygenic scores predict reading performance throughout the school years, Sci Stud Read, 21, 334, 10.1080/10888438.2017.1299152

Du Rietz, 2018, Association of polygenic risk for attention-deficit/hyperactivity disorder with co-occurring traits and disorders, Biol Psychiatry Cogn Neurosci Neuroimaging, 3, 635, 10.1016/j.bpsc.2017.11.013

Sudlow, 2015, UK Biobank: An open access resource for identifying the causes of a wide range of complex diseases of middle and old age, PLoS Med, 12, e1001779, 10.1371/journal.pmed.1001779

Danciu, 2014, Secondary use of clinical data: The Vanderbilt approach, J Biomed Inform, 52, 28, 10.1016/j.jbi.2014.02.003

Kaiser, 2016, NIH's 1-million-volunteer precision medicine study announces first pilot projects, Science

Euesden, 2015, PRSice: Polygenic risk score software, Bioinformatics, 31, 1466, 10.1093/bioinformatics/btu848

Choi, 2018, A guide to performing Polygenic Risk Score analyses, bioRxiv, 10.1101/416545

Mak, 2017, Polygenic scores via penalized regression on summary statistics, Genet Epidemiol, 41, 469, 10.1002/gepi.22050

Vilhjálmsson, 2015, Modeling linkage disequilibrium increases accuracy of polygenic risk scores, Am J Hum Genet, 97, 576, 10.1016/j.ajhg.2015.09.001

Chang, 2015, Second-generation PLINK: Rising to the challenge of larger and richer datasets, GigaScience, 4, 7, 10.1186/s13742-015-0047-8

Wray, 2014, Research review: Polygenic methods and their application to psychiatric traits, J Child Psychol Psychiatry, 55, 1068, 10.1111/jcpp.12295

Li, 2009, Genotype imputation, Annu Rev Genomics Hum Genet, 10, 387, 10.1146/annurev.genom.9.081307.164242

Wood, 2014, Defining the role of common variation in the genomic and biological architecture of adult human height, Nat Genet, 46, 1173, 10.1038/ng.3097

Locke, 2015, Genetic studies of body mass index yield new insights for obesity biology, Nature, 518, 197, 10.1038/nature14177

North, 2002, A note on the calculation of empirical P values from Monte Carlo procedures, Am J Hum Genet, 71, 439, 10.1086/341527

Hagenaars, 2016, Shared genetic aetiology between cognitive functions and physical and mental health in UK Biobank (N = 112 151) and 24 GWAS consortia, Mol Psychiatry, 21, 1624, 10.1038/mp.2015.225

Allegrini, 2018, Genomic prediction of cognitive traits in childhood and adolescence, bioRxiv, 10.1101/418210

Ge, 2019, Polygenic prediction via Bayesian regression and continuous shrinkage priors, Nat Commun, 10, 1776, 10.1038/s41467-019-09718-5

Cecile, 2019, Polygenic risk scores that predict common diseases using millions of single nucleotide polymorphisms: Is more, better?, Clin Chem, 65, 609, 10.1373/clinchem.2018.296103

Duncan, 2018, Analysis of polygenic score usage and performance across diverse human populations, bioRxiv, 10.1101/398396

Márquez‐Luna, 2017, Multiethnic polygenic risk scores improve risk prediction in diverse populations, Genet Epidemiol, 41, 811, 10.1002/gepi.22083

Martin, 2017, Human demographic history impacts genetic risk prediction across diverse populations, Am J Hum Genet, 100, 635, 10.1016/j.ajhg.2017.03.004

Choi, 2019, Supporting data for “PRSice-2: Polygenic Risk Score Software for Large-Scale Data.”, GigaScience Database, 10.1093/gigascience/giz082