FastSKAT: Sequence kernel association tests for very large sets of markers

Genetic Epidemiology - Tập 42 Số 6 - Trang 516-527 - 2018
Thomas Lumley1, Jennifer A. Brody2, Gina M. Peloso3, Alanna C. Morrison4, Kenneth Rice5
1Department of Statistics, University of Auckland, Auckland, New Zealand
2Cardiovascular Health Research Unit, University of Washington, Seattle, Washington
3Department of Biostatistics, Boston University, Boston, Massachusetts
4University of Texas Health Science Center at Houston, Houston, Texas
5Department of Biostatistics, University of Washington, Seattle, Washington

Tóm tắt

AbstractThe sequence kernel association test (SKAT) is widely used to test for associations between a phenotype and a set of genetic variants that are usually rare. Evaluating tail probabilities or quantiles of the null distribution for SKAT requires computing the eigenvalues of a matrix related to the genotype covariance between markers. Extracting the full set of eigenvalues of this matrix (an matrix, for n subjects) has computational complexity proportional to n3. As SKAT is often used when , this step becomes a major bottleneck in its use in practice. We therefore propose fastSKAT, a new computationally inexpensive but accurate approximations to the tail probabilities, in which the k largest eigenvalues of a weighted genotype covariance matrix or the largest singular values of a weighted genotype matrix are extracted, and a single term based on the Satterthwaite approximation is used for the remaining eigenvalues. While the method is not particularly sensitive to the choice of k, we also describe how to choose its value, and show how fastSKAT can automatically alert users to the rare cases where the choice may affect results. As well as providing faster implementation of SKAT, the new method also enables entirely new applications of SKAT that were not possible before; we give examples grouping variants by topologically associating domains, and comparing chromosome‐wide association by class of histone marker.

Từ khóa


Tài liệu tham khảo

10.1137/1.9780898719604

10.1214/aoap/1177005712

10.1145/567806.567807

10.1101/gr.083634.108

10.1002/gepi.21791

10.1002/gepi.21703

10.1002/gepi.21703

10.1038/nrg2779

10.2307/2346911

10.1038/nature11082

10.2307/2347721

Fitzmaurice G. M., 2012, Applied longitudinal analysis

Fuller W., 2011, Sampling statistics

10.1016/j.ajhg.2015.12.022

Golub G. H., 1996, Matrix computations

10.1145/1377603.1377607

10.1137/090771806

10.1080/03610919008812866

10.1038/ng.548

Korobeynikov A., 2016, svd: Interfaces to various state‐of‐art SVD and eigensolvers

10.1093/biomet/86.4.929

10.1016/j.ajhg.2014.06.009

10.1016/j.ajhg.2012.06.007

Lee S., 2016, SKAT: SNP‐set (sequence) kernel association test

10.1093/biostatistics/kxs014

10.1145/3004053

10.1161/CIRCGENETICS.113.000350

10.1016/j.csda.2008.11.025

10.1038/ng.3190

Lumley T., 2011, Complex surveys: A guide to analysis using R

10.1038/ng.2671

10.1038/ng1847

10.1161/CIRCGENETICS.108.829747

R Core Team, 2016, R: A language and environment for statistical computing

10.1002/gepi.21676

10.1002/gepi.21820

10.1142/S1793536911000787

10.1002/gepi.21913

10.1016/j.ajhg.2010.05.002

10.1016/j.ajhg.2011.05.029

10.1145/1824801.1824805

10.3109/10409238.2015.1087961