OpenMendel: a cooperative programming project for statistical genetics
Tóm tắt
Từ khóa
Tài liệu tham khảo
Aird I, Bentall HH, Roberts JF (1953) Relationship between cancer of stomach and the abo blood groups. Br Med J 1(4814):799
Alexander DH, Novembre J, Lange K (2009) Fast model-based estimation of ancestry in unrelated individuals. Genome Res 19(9):1655–1664. https://doi.org/10.1101/gr.094052.109
Amin N, Van Duijn CM, Aulchenko YS (2007) A genomic background based method for association analysis in related individuals. PLoS One 2(12):e1274
Astle W, Balding DJ et al (2009) Population structure and cryptic relatedness in genetic association studies. Stat Sci 24(4):451–471
Bahmani S, Raj B, Boufounos PT (2013) Greedy sparsity-constrained optimization. J Mach Learn Res 14(Mar):807–841
Bezanson J, Edelman A, Karpinski S, Shah VB (2017) Julia: a fresh approach to numerical computing. SIAM Rev 59(1):65–98. https://doi.org/10.1137/141000671
Bickerstaffe A, Ranaweera T, Endersby T, Ellis C, Maddumarachchi S, Gooden GE, White P, Moses EK, Hewitt AW, Hopper JL (2017) The Ark: a customizable web-based data management tool for health and medical research. Bioinformatics 33(4):624–626. https://doi.org/10.1093/bioinformatics/btw675
Blumensath T, Davies ME (2008) Iterative thresholding for sparse approximations. J Fourier Anal Appl 14(5–6):629–654
Blumensath T, Davies ME (2009) Iterative hard thresholding for compressed sensing. Appl Comput Harmon Anal 27(3):265–274
Boerwinkle E, Sing C (1987) The use of measured genotype information in the analysis of quantitative phenotypes in man. Ann Hum Genet 51(3):211–226
Brody JA, Morrison AC, Bis JC, O’Connell JR, Brown MR, Huffman JE, Ames DC, Carroll A, Conomos MP, Gabriel S et al (2017) Analysis commons, a team approach to discovery in a big-data environment for genetic epidemiology. Nat Genet 49(11):1560
Bühlmann P, Van De Geer S (2011) Statistics for high-dimensional data: methods, theory and applications. Springer Science & Business Media, New York
Burgess S, Thompson SG (2015) Mendelian randomization: methods for using genetic variants in causal estimation. Chapman and Hall/CRC, Boca Raton
Candès EJ, Recht B (2009) Exact matrix completion via convex optimization. Found Comput Math 9(6):717–772. https://doi.org/10.1007/s10208-009-9045-5
Cantor RM, Lange K, Sinsheimer JS (2010) Prioritizing GWAS results: a review of statistical methods and recommendations for their application. Am J Hum Genet 86(1):6–22
Chen R, Mias GI, Li-Pook-Than J, Jiang L, Lam HYK, Chen R, Miriami E, Karczewski KJ, Hariharan M, Dewey FE, Cheng Y, Clark MJ, Im H, Habegger L, Balasubramanian S, O’Huallachain M, Dudley JT, Hillenmeyer S, Haraksingh R, Sharon D, Euskirchen G, Lacroute P, Bettinger K, Boyle AP, Kasowski M, Grubert F, Seki S, Garcia M, Whirl-Carrillo M, Gallardo M, Blasco MA, Greenberg PL, Snyder P, Klein TE, Altman RB, Butte AJ, Ashley EA, Gerstein M, Nadeau KC, Tang H, Snyder M (2012) Personal omics profiling reveals dynamic molecular and medical phenotypes. Cell 148(6):1293–1307. https://doi.org/10.1016/j.cell.2012.02.009
Chen WM, Abecasis GR (2007) Family-based association tests for genomewide association scans. Am J Hum Genet 81(5):913–926
Chi EC, Zhou H, Chen GK, Del Vecchyo DO, Lange K (2013) Genotype imputation via matrix completion. Genome Res 23(3):509–518. https://doi.org/10.1101/gr.145821.112
Chiu Cy, Jung J, Chen W, Weeks DE, Ren H, Boehnke M, Amos CI, Liu A, Mills JL, Ting Lee Ml, Xiong M, Fan R (2016) Meta-analysis of quantitative pleiotropic traits for next-generation sequencing with multivariate functional linear models. European Journal Of Human Genetics 25:350 EP. https://doi.org/10.1038/ejhg.2016.170
Clark MM, Blangero J, Dyer TD, Sobel EM, Sinsheimer JS (2016) The quantitative-MFG test: a linear mixed effect model to detect maternal-offspring gene interactions. Ann Hum Genet 80(1):63–80. https://doi.org/10.1111/ahg.12137
Claster A (2017) Julia joins petaflop club. URL https://juliacomputing.com/press/2017/09/12/julia-joins-petaflop-club.html
Conomos MP, Reiner AP, Weir BS, Thornton TA (2016) Model-free estimation of recent genetic relatedness. Am J Hum Genet 98(1):127–148
Cookson W, Liang L, Abecasis G, Moffatt M, Lathrop M (2009) Mapping complex disease traits with global gene expression. Nat Rev Genet 10(3):184
Day-Williams AG, Blangero J, Dyer TD, Lange K, Sobel EM (2011) Linkage analysis without defined pedigrees. Genet Epidemiol 35(5):360–370. https://doi.org/10.1002/gepi.20584
Falconer D, Mackay T (1996) C. 1996. Introduction to Quantitative Genetics, pp 82–86
Fan R, Wang Y, Chiu Cy, Chen W, Ren H, Li Y, Boehnke M, Amos CI, Moore JH, Xiong M (2016) Meta-analysis of complex diseases at gene level with generalized functional linear models. Genetics 202(2):457–470. https://doi.org/10.1534/genetics.115.180869 . http://www.genetics.org/content/202/2/457
Fisher RA (1915) Frequency distribution of the values of the correlation coefficient in samples from an indefinitely large population. Biometrika 10(4):507–521
Fisher RA (1921) On the probable error of a coefficient of correlation deduced from a small sample. Metron 1:3–32
Gaziano JM, Concato J, Brophy M, Fiore L, Pyarajan S, Breeling J, Whitbourne S, Deen J, Shannon C, Humphries D, Guarino P, Aslan M, Anderson D, LaFleur R, Hammond T, Schaa K, Moser J, Huang G, Muralidhar S, Przygodzki R, O’Leary TJ (2016) Million Veteran Program: a mega-biobank to study genetic influences on health and disease. J Clin Epidemiol 70:214–223. https://doi.org/10.1016/j.jclinepi.2015.09.016
Hall MA, Wallace J, Lucas A, Kim D, Basile AO, Verma SS, McCarty CA, Brilliant MH, Peissig PL, Kitchner TE et al (2017) Plato software provides analytic framework for investigating complexity beyond genome-wide association studies. Nat Commun 8(1):1167
Hastie T, Mazumder R, Lee JD, Zadeh R (2015) Matrix completion and low-rank SVD via fast alternating least squares. J. Mach. Learn. Res. 16(1):3367–3402, http://dl.acm.org/citation.cfm?id=2789272.2912106
Helgason A, Yngvadóttir B, Hrafnkelsson B, Gulcher J, Stefánsson K (2005) An Icelandic example of the impact of population structure on association studies. Nat Genet 37(1):90
Howie B, Fuchsberger C, Stephens M, Marchini J, Abecasis GR (2012) Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat Genet 44(8):955
Howie BN, Donnelly P, Marchini J (2009) A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet 5(6):e1000,529. https://doi.org/10.1371/journal.pgen.1000529
Jacquard A (1974) The genetic structure of populations, vol 5. Springer Science & Business Media, New York
Kang HM, Sul JH, Service SK, Zaitlen NA, Kong SY, Freimer NB, Sabatti C, Eskin E (2010) Variance component model to account for sample structure in genome-wide association studies. Nat Genet 42(4):348–354
Kawaguchi ES, Suchard MA, Liu Z, Li G (2018) Scalable sparse Cox regression for large-scale survival data via broken adaptive ridge. arXiv:1712.00561 (in preparation)
Keys KL, Chen GK, Lange K (2017) Iterative hard thresholding for model selection in genome-wide association studies. Genet Epidemiol 41(8):756–768
Khanna R, Kyrillidis A (2018) Iht dies hard: Provable accelerated iterative hard thresholding. In: International Conference on Artificial Intelligence and Statistics, pp 188–198
Kilpinen H, Barrett JC (2013) How next-generation sequencing is transforming complex disease genetics. Trends Genet 29(1):23–30
Kim J, Bai Y, Pan W (2015) An adaptive association test for multiple phenotypes with GWAS summary statistics. Genet Epidemiol 39(8):651–663
Knowler WC, Williams R, Pettitt D, Steinberg AG (1988) Gm3; 5, 13, 14 and type 2 diabetes mellitus: an association in american indians with genetic admixture. Am J Hum Genet 43(4):520
Lange K (2003) Mathematical and statistical methods for genetic analysis. Springer Science & Business Media, New York
Lange K (2016) MM Optimization Algorithms. Society for Industrial and Applied Mathematics, Philadelphia, PA. https://doi.org/10.1137/1.9781611974409.ch1
Lange K, Papp JC, Sinsheimer JS, Sripracha R, Zhou H, Sobel EM (2013) Mendel: the Swiss army knife of genetic analysis programs. Bioinformatics 29(12):1568–1570
Lange K, Sinsheimer J (1992) Calculation of genetic identity coefficients. Ann Hum Genet 56(4):339–346
Lee S, Abecasis GR, Boehnke M, Lin X (2014) Rare-variant association analysis: study designs and statistical tests. Am J Hum Genet 95(1):5–23. https://doi.org/10.1016/j.ajhg.2014.06.009
Li Y, Willer CJ, Ding J, Scheet P, Abecasis GR (2010) MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol 34(8):816–834. https://doi.org/10.1002/gepi.20533
Liberty E, Woolfe F, Martinsson PG, Rokhlin V, Tygert M (2007) Randomized algorithms for the low-rank approximation of matrices. Proc Natl Acad Sci USA 104(51):20167–20172. https://doi.org/10.1073/pnas.0709640104
Lippert C, Listgarten J, Liu Y, Kadie CM, Davidson RI, Heckerman D (2011) FaST linear mixed models for genome-wide association studies. Nat Methods 8(10):833–835
Liu Y, Athanasiadis G, Weale ME (2008) A survey of genetic simulation software for population and epidemiological studies. Hum Genom 3(1):79
Mancuso N, Shi H, Goddard P, Kichaev G, Gusev A, Pasaniuc B (2017) Integrating gene expression with summary association statistics to identify genes associated with 30 complex traits. Am J Hum Genet 100(3):473–487
Manichaikul A, Mychaleckyj JC, Rich SS, Daly K, Sale M, Chen WM (2010) Robust relationship inference in genome-wide association studies. Bioinformatics 26(22):2867–2873
Marchini J, Howie B, Myers S, McVean G, Donnelly P (2007) A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet 39(7):906–913. https://doi.org/10.1038/ng2088
Mittal S, Madigan D, Burd RS, Suchard MA (2014) High-dimensional, massive sample-size Cox proportional hazards regression for survival analysis. Biostatistics 15(2):207–221. https://doi.org/10.1093/biostatistics/kxt043
Morris AP, Lindgren CM, Zeggini E, Timpson NJ, Frayling TM, Hattersley AT, McCarthy MI (2010) A powerful approach to sub-phenotype analysis in population-based genetic association studies. Gen Epidemiol 34(4):335–343
Novembre J, Peter BM (2016) Recent advances in the study of fine-scale population structure in humans. Curr Opin Genet Dev 41:98–105
Pickrell WO, Rees MI, Chung SK (2012) Next generation sequencing methodologies-an overview. In: Advances in protein chemistry and structural biology, vol. 89, pp. 1–26. Elsevier
Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38(8):904–909. https://doi.org/10.1038/ng1847
Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using multilocus genotype data. Genetics 155(2):945–959
Ranaweera T, Makalic E, Hopper JL, Bickerstaffe A (2018) An open-source, integrated pedigree data management and visualization tool for genetic epidemiology. Int J Epidemiol 47(4):1034–1039. https://doi.org/10.1093/ije/dyy049
Rosenberg NA, Li LM, Ward R, Pritchard JK (2003) Informativeness of genetic markers for inference of ancestry. Am J Hum Genet 73(6):1402–1422
Schäffer AA, Lemire M, Ott J, Lathrop GM, Weeks DE (2011) Coordinated conditional simulation with slink and sup of many markers linked or associated to a trait in large pedigrees. Hum Hered 71(2):126–134
Schaid DJ, Rowland CM, Tines DE, Jacobson RM, Poland GA (2002) Score tests for association between traits and haplotypes when linkage phase is ambiguous. Am J Hum Genet 70(2):425–434
Shen J, Li P (2017) A tight bound of hard thresholding. J Mach Learn Res 18(1):7650–7691
Sobel E, Lange K, OConnell JR, Weeks DE (1996) Haplotyping algorithms. In: Genetic mapping and DNA sequencing, pp. 89–110. Springer
Suchard MA, Simpson SE, Zorych I, Ryan P, Madigan D (2013) Massive parallelization of serial inference algorithms for a complex generalized linear model. ACM Transactions on Modeling and Computer Simulation (TOMACS) 23(1):article10:1–17. https://doi.org/10.1145/2414416.2414791
Sudlow C, Gallacher J, Allen N, Beral V, Burton P, Danesh J, Downey P, Elliott P, Green J, Landray M et al (2015) UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med 12(3):e1001,779
Svishcheva GR, Axenovich TI, Belonogova NM, van Duijn CM, Aulchenko YS (2012) Rapid variance components-based method for whole-genome association analysis. Nat Genet 44(10):1166
Telenti A, Pierce LCT, Biggs WH, di Iulio J, Wong EHM, Fabani MM, Kirkness EF, Moustafa A, Shah N, Xie C, Brewerton SC, Bulsara N, Garner C, Metzker G, Sandoval E, Perkins BA, Och FJ, Turpaz Y, Venter JC (2016) Deep sequencing of 10,000 human genomes. Proc Natl Acad Sci 113(42):11901–11906. https://doi.org/10.1073/pnas.1613365113
Van Dijk EL, Auger H, Jaszczyszyn Y, Thermes C (2014) Ten years of next-generation sequencing technology. Trends Genet 30(9):418–426
Van Leeuwen EM, Kanterakis A, Deelen P, Kattenberg MV, Abdellaoui A, Hofman A, Schönhuth A, Menelaou A, de Craen AJ, van Schaik BD et al (2015) Population-specific genotype imputations using minimac or impute2. Nat Protocols 10(9):1285
Visscher PM, Brown MA, McCarthy MI, Yang J (2012) Five years of GWAS discovery. Am J Hum Genet 90(1):7–24
Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, Yang J (2017) 10 years of GWAS discovery: biology, function, and translation. Am J Hum Genet 101(1):5–22
Wang B, Sverdlov S, Thompson E (2017) Efficient estimation of realized kinship from SNP genotypes. Genetics 210(2)
Wang K, Li M, Bucan M (2007) Pathway-based approaches for analysis of genomewide association studies. Am J Hum Genet 81(6):1278–1283
Wu MC, Lee S, Cai T, Li Y, Boehnke M, Lin X (2011) Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet 1(89):82–93
Wu TT, Chen YF, Hastie T, Sobel E, Lange K (2009) Genome-wide association analysis by lasso penalized logistic regression. Bioinformatics 25(6):714–721
Yang F, Barber RF, Jain P, Lafferty J (2016) Selective inference for group-sparse linear models. In: Advances in Neural Information Processing Systems, pp 2469–2477
Yang J, Ferreira T, Morris AP, Medland SE, Madden PA, Heath AC, Martin NG, Montgomery GW, Weedon MN, Loos RJ et al (2012) Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat Genet 44(4):369
Yang J, Lee SH, Goddard ME, Visscher PM (2011) GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet 88(1):76–82. https://doi.org/10.1016/j.ajhg.2010.11.011
Yuan X, Miller DJ, Zhang J, Herrington D, Wang Y (2012) An overview of population genetic data simulation. J Comput Biol 19(1):42–54
Yuan XT, Li P, Zhang T (2017) Gradient hard thresholding pursuit. J Mach Learn Res 18:166–221
Zhou H, Alexander D, Lange K (2011) A quasi-newton acceleration for high-dimensional optimization algorithms. Stat Comput 21(2):261–273
Zhou H, Alexander DH, Sehl ME, Sinsheimer JS, Sobel E, Lange K (2011) Penalized regression for genome-wide association screening of sequence data. In: Biocomputing 2011, pp. 106–117. World Scientific
Zhou H, Blangero J, Dyer TD, Chan KhK, Lange K, Sobel EM (2017) Fast genome-wide QTL association mapping on pedigree and population data. Genet Epidemiol 41(3):174–186. https://doi.org/10.1002/gepi.21988
Zhou H, Hu L, Zhou J, Lange K (2018) MM algorithms for variance components models. J Comput Graph Stat Accept. https://doi.org/10.1080/10618600.2018.1529601
Zhou H, Sehl ME, Sinsheimer JS, Lange K (2010) Association screening of common and rare genetic variants by penalized regression. Bioinformatics 26(19):2375–2382
Zhou JJ, Hu T, Qiao D, Cho MH, Zhou H (2016) Boosting gene mapping power and efficiency with efficient exact variance component tests of SNP sets. Genetics 204(3):921–931
Zhou JJ, Sinsheimer JS, Cho MH, Castaldi P, Zhou H (2018) MMVC: An efficient mm algorithm to quantify genetic correlations across large number of phenotypes in giant datasets. manuscript in preparation
Zhou X, Stephens M (2014) Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nat Methods 11(4):407–409. https://doi.org/10.1038/nmeth.2848