Second-generation PLINK: rising to the challenge of larger and richer datasets
Tóm tắt
Từ khóa
Tài liệu tham khảo
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira M, Bender D, et al. Plink: A tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007; 81:559–75.
Browning B, Browning S. Improving the accuracy and efficiency of identity by descent detection in population data. Genetics. 2013; 194:459–71.
Howie B, Donnelly P, Marchini J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 2009; 5:1000529.
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The genome analysis toolkit: A mapreduce framework for analyzing next-generation dna sequencing data. Genome Res. 2010; 20:1297–303.
Danecek P, Auton A, Abecasis G, Albers C, Banks E, DePristo M, et al. The variant call format and vcftools. Bioinformatics. 2011; 27:2156–8.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, 1000 Genome Project Data Processing Subgroup, et al. The sequence alignment/map format and samtools. Bioinformatics. 2009; 25:2078–9.
Yang J, Lee S, Goddard M, Visscher P. Gcta: A tool for genome-wide complex trait analysis. Am J Hum Genet. 2011; 88:76–82.
Chang C, Chow C, Tellier L, Vattikuti S, Purcell S, Lee J. Software and Supporting Material for "Second-generation PLINK: Rising to the Challenge of Larger and Richer Datasets". GigaScience Database. http://dx.doi.org/10.5524/100116 .
Dalke A. Update: Faster Population Counts. http://www.dalkescientific.com/writings/diary/archive/2011/11/02/faster_popcount_update.html .
Lee V, Kim C, Chhugani J, Deisher M, Kim D, Nguyen A, et al. Debunking the 100x gpu vs. cpu myth: an evaluation of throughput computing on cpu and gpu. In: Proceedings of the 37th Annual International Symposium on Computer Architecture: 19-23 June 2010. Saint-Malo, France,: ACM: 2010. p. 451–460.
Haque I, Pande V, Walters W. Anatomy of high-performance 2d similarity calculations. J Chem Inf Model. 2011; 51:2345–51.
Wigginton J, Cutler D, Abecasis G. A note on exact tests of hardy-weinberg equilibrium. Am J Hum Genet. 2005; 76:887–93.
Guo S, Thompson E. Performing the exact test of hardy-weinberg proportion for multiple alleles. Biometrics. 1992; 48:361–72.
Mehta C, Patel N. Algorithm 643: Fexact: a fortran subroutine for fisher’s exact test on unordered r ×c contingency tables. ACM Trans Math Softw. 1986; 12:154–61.
Clarkson D, Fan Y, Joe H. A remark on algorithm 643: Fexact: an algorithm for performing fisher’s exact test in r x c contingency tables. ACM Trans Math Softw. 1993; 19:484–8.
Requena F, Martín Ciudad N. A major improvement to the network algorithm for fisher’s exact test in 2 ×c contingency tables. J Comp Stat & Data Anal. 2006; 51:490–8.
Chang C. Standalone C/C++ Exact Statistical Test Functions. https://github.com/chrchang/stats .
Lydersen S, Fagerland M, Laake P. Recommended tests for association in 2 ×2 tables. Statist Med. 2009; 28:1159–75.
Graffelman J, Moreno V. The mid p-value in exact tests for hardy-weinberg equilibrium. Stat Appl Genet Mol Bio. 2013; 12:433–48.
Wall J, Pritchard J. Assessing the performance of the haplotype block model of linkage disequilibrium. Am J Hum Genet. 2003; 73:502–15.
Gabriel S, Schaffner S, Nguyen H, Moore J, Roy J, Blumenstiel B, et al. The structure of haplotype blocks in the human genome. Science. 2002; 296:2225–9.
Barrett J, Fry B, Maller J, Daly M. Haploview: analysis and visualization of ld and haplotype maps. Bioinformatics. 2005; 21:263–5.
Hill W. Estimation of linkage disequilibrium in randomly mating populations. Heredity. 1974; 33:229–39.
Gaunt T, Rodríguez S, Day I. Cubic exact solutions for the estimation of pairwise haplotype frequencies: implications for linkage disequilibrium analyses and a web tool ’cubex’. BMC Bioinformatics. 2007; 8:428.
Taliun D, Gamper J, Pattaro C. Efficient haplotype block recognition of very long and dense genetic sequences. BMC Bioinformatics. 2014; 15:10.
Friedman J, Hastie T, Höfling H, Tibshirani R. Pathwise coordinate optimization. Ann Appl Stat. 2007; 1:302–32.
Vattikuti S, Lee J, Chang C, Hsu S, Chow C. Applying compressed sensing to genome-wide association studies. GigaScience. 2014; 3:10.
Steiß V, Letschert T, Schäfer H, Pahl R. Permory-mpi: A program for high-speed parallel permutation testing in genome-wide association studies. Bioinformatics. 2012; 28:1168–9.
Wan X, Yang C, Yang Q, Xue H, Fan X, Tang N, et al. Boost: A fast approach to detecting gene-gene interactions in genome-wide case-control studies. Am J Hum Genet. 2010; 87:325–40.
Ueki M, Cordell H. Improved statistics for genome-wide interaction analysis. PLoS Genet. 2012; 8:1002625.
Howey R. CASSI: Genome-Wide Interaction Analysis Software. http://www.staff.ncl.ac.uk/richard.howey/cassi .
GWASSpeedup Problem Statement. http://community.topcoder.com/longcontest/?module=ViewProblemStatement&rd=15637&pm=12525 .
Adler M. Pigz: Parallel Gzip. http://zlib.net/pigz/ .
Abecasis G, Cardon L, Cookson W. A general test of association for quantitative traits in nuclear families. Am J Hum Genet. 2000; 66:279–92.
Ewens W, Li M, Spielman R. A review of family-based tests for linkage disequilibrium between a quantitative trait and a genetic marker. PLoS Genet. 2008; 4:1000180.
Su Z, Marchini J, Donnelly P. Hapgen2: Simulation of multiple disease snps. Bioinformatics. 2011; 27:2304–5.
Xu Y, Wu Y, Song C, Zhang H. Simulating realistic genomic data with rare variants. Genet Epidemiol. 2013; 37:163–72.
The 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012; 491:56–65.
Browning B, Browning S. A fast, powerful method for detecting identity by descent. Am J Hum Genet. 2011; 88:173–82.
Browning B. Presto: rapid calculation of order statistic distributions and multiple-testing adjusted p-values via permutation for one and two-stage genetic association studies. BMC Bioinformatics. 2008; 9:309.
Sambo F, Di Camillo B, Toffolo G, Cobelli C. Compression and fast retrieval of snp data. Bioinformatics. 2014; 30:495.
PLINK/SEQ: A Library for the Analysis of Genetic Variation Data. https://atgu.mgh.harvard.edu/plinkseq/ .