PRSice-2: Polygenic Risk Score software for biobank-scale data
Tóm tắt
Polygenic risk score (PRS) analyses have become an integral part of biomedical research, exploited to gain insights into shared aetiology among traits, to control for genomic profile in experimental studies, and to strengthen causal inference, among a range of applications. Substantial efforts are now devoted to biobank projects to collect large genetic and phenotypic data, providing unprecedented opportunity for genetic discovery and applications. To process the large-scale data provided by such biobank resources, highly efficient and scalable methods and software are required.
Here we introduce PRSice-2, an efficient and scalable software program for automating and simplifying PRS analyses on large-scale data. PRSice-2 handles both genotyped and imputed data, provides empirical association P-values free from inflation due to overfitting, supports different inheritance models, and can evaluate multiple continuous and binary target traits simultaneously. We demonstrate that PRSice-2 is dramatically faster and more memory-efficient than PRSice-1 and alternative PRS software, LDpred and lassosum, while having comparable predictive power.
PRSice-2's combination of efficiency and power will be increasingly important as data sizes grow and as the applications of PRS become more sophisticated, e.g., when incorporated into high-dimensional or gene set–based analyses. PRSice-2 is written in C++, with an R script for plotting, and is freely available for download from http://PRSice.info.
Từ khóa
Tài liệu tham khảo
Mavaddat, 2015, Prediction of breast cancer risk based on profiling with common genetic variants, J Natl Cancer Inst, 107, 10.1093/jnci/djv036
Kuchenbaecker, 2017, Evaluation of polygenic risk scores for breast and ovarian cancer risk prediction in BRCA1 and BRCA2 mutation carriers, J Natl Cancer Inst, 109, 10.1093/jnci/djw302
Natarajan, 2017, Polygenic risk score identifies subgroup with higher burden of atherosclerosis and greater relative benefit from statin therapy in the primary prevention setting, Circulation, 135, 2091, 10.1161/CIRCULATIONAHA.116.024436
Udler, 2018, Clustering of type 2 diabetes genetic loci by multi-trait associations identifies disease mechanisms and subtypes, bioRxiv, 10.1101/319509
Krapohl, 2016, Phenome-wide analysis of genome-wide polygenic scores, Mol Psychiatry, 21, 1188, 10.1038/mp.2015.126
Krapohl, 2018, Multi-polygenic score approach to trait prediction, Mol Psychiatry, 23, 1368, 10.1038/mp.2017.163
Selzam, 2017, Predicting educational achievement from DNA, Mol Psychiatry, 22, 267, 10.1038/mp.2016.107
Selzam, 2017, Genome-wide polygenic scores predict reading performance throughout the school years, Sci Stud Read, 21, 334, 10.1080/10888438.2017.1299152
Du Rietz, 2018, Association of polygenic risk for attention-deficit/hyperactivity disorder with co-occurring traits and disorders, Biol Psychiatry Cogn Neurosci Neuroimaging, 3, 635, 10.1016/j.bpsc.2017.11.013
Sudlow, 2015, UK Biobank: An open access resource for identifying the causes of a wide range of complex diseases of middle and old age, PLoS Med, 12, e1001779, 10.1371/journal.pmed.1001779
Danciu, 2014, Secondary use of clinical data: The Vanderbilt approach, J Biomed Inform, 52, 28, 10.1016/j.jbi.2014.02.003
Kaiser, 2016, NIH's 1-million-volunteer precision medicine study announces first pilot projects, Science
Euesden, 2015, PRSice: Polygenic risk score software, Bioinformatics, 31, 1466, 10.1093/bioinformatics/btu848
Mak, 2017, Polygenic scores via penalized regression on summary statistics, Genet Epidemiol, 41, 469, 10.1002/gepi.22050
Vilhjálmsson, 2015, Modeling linkage disequilibrium increases accuracy of polygenic risk scores, Am J Hum Genet, 97, 576, 10.1016/j.ajhg.2015.09.001
Chang, 2015, Second-generation PLINK: Rising to the challenge of larger and richer datasets, GigaScience, 4, 7, 10.1186/s13742-015-0047-8
Wray, 2014, Research review: Polygenic methods and their application to psychiatric traits, J Child Psychol Psychiatry, 55, 1068, 10.1111/jcpp.12295
Li, 2009, Genotype imputation, Annu Rev Genomics Hum Genet, 10, 387, 10.1146/annurev.genom.9.081307.164242
Wood, 2014, Defining the role of common variation in the genomic and biological architecture of adult human height, Nat Genet, 46, 1173, 10.1038/ng.3097
Locke, 2015, Genetic studies of body mass index yield new insights for obesity biology, Nature, 518, 197, 10.1038/nature14177
North, 2002, A note on the calculation of empirical P values from Monte Carlo procedures, Am J Hum Genet, 71, 439, 10.1086/341527
Hagenaars, 2016, Shared genetic aetiology between cognitive functions and physical and mental health in UK Biobank (N = 112 151) and 24 GWAS consortia, Mol Psychiatry, 21, 1624, 10.1038/mp.2015.225
Allegrini, 2018, Genomic prediction of cognitive traits in childhood and adolescence, bioRxiv, 10.1101/418210
Ge, 2019, Polygenic prediction via Bayesian regression and continuous shrinkage priors, Nat Commun, 10, 1776, 10.1038/s41467-019-09718-5
Cecile, 2019, Polygenic risk scores that predict common diseases using millions of single nucleotide polymorphisms: Is more, better?, Clin Chem, 65, 609, 10.1373/clinchem.2018.296103
Duncan, 2018, Analysis of polygenic score usage and performance across diverse human populations, bioRxiv, 10.1101/398396
Márquez‐Luna, 2017, Multiethnic polygenic risk scores improve risk prediction in diverse populations, Genet Epidemiol, 41, 811, 10.1002/gepi.22083
Martin, 2017, Human demographic history impacts genetic risk prediction across diverse populations, Am J Hum Genet, 100, 635, 10.1016/j.ajhg.2017.03.004