Second-generation PLINK: rising to the challenge of larger and richer datasets

Oxford University Press (OUP) - Tập 4 Số 1 - 2015
Christopher Chang1,2, Carson C. Chow3, Laurent Tellier1,4, Shashaank Vattikuti3, Shaun Purcell5,6,7,8, James J. Lee9,3
1BGI Cognitive Genomics Lab, Building No. 11, Bei Shan Industrial Zone, Yantian District, 518083 Shenzhen, China
2Complete Genomics, 2071 Stierlin Court, 94043 Mountain View, CA, USA
3Mathematical Biology Section, NIDDK/LBM, National Institutes of Health, 20892 Bethesda, MD, USA
4Bioinformatics Centre, University of Copenhagen, 2200 Copenhagen, Denmark
5Analytic and Translational Genetics Unit, Psychiatric and Neurodevelopmental Genetics Unit, Massachusetts General Hospital, 02114 Boston, MA, USA
6Division of Psychiatric Genomics, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, 10029 New York, NY, USA
7Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, 10029 New York, NY, USA
8Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, 02142 Cambridge, MA, USA
9Department of Psychology, University of Minnesota Twin Cities, 55455 Minneapolis, MN, USA

Tóm tắt

Từ khóa


Tài liệu tham khảo

Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira M, Bender D, et al. Plink: A tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007; 81:559–75.

Browning B, Browning S. Improving the accuracy and efficiency of identity by descent detection in population data. Genetics. 2013; 194:459–71.

Howie B, Donnelly P, Marchini J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 2009; 5:1000529.

McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The genome analysis toolkit: A mapreduce framework for analyzing next-generation dna sequencing data. Genome Res. 2010; 20:1297–303.

Danecek P, Auton A, Abecasis G, Albers C, Banks E, DePristo M, et al. The variant call format and vcftools. Bioinformatics. 2011; 27:2156–8.

Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, 1000 Genome Project Data Processing Subgroup, et al. The sequence alignment/map format and samtools. Bioinformatics. 2009; 25:2078–9.

Yang J, Lee S, Goddard M, Visscher P. Gcta: A tool for genome-wide complex trait analysis. Am J Hum Genet. 2011; 88:76–82.

Chang C, Chow C, Tellier L, Vattikuti S, Purcell S, Lee J. Software and Supporting Material for "Second-generation PLINK: Rising to the Challenge of Larger and Richer Datasets". GigaScience Database. http://dx.doi.org/10.5524/100116 .

Dalke A. Update: Faster Population Counts. http://www.dalkescientific.com/writings/diary/archive/2011/11/02/faster_popcount_update.html .

Lee V, Kim C, Chhugani J, Deisher M, Kim D, Nguyen A, et al. Debunking the 100x gpu vs. cpu myth: an evaluation of throughput computing on cpu and gpu. In: Proceedings of the 37th Annual International Symposium on Computer Architecture: 19-23 June 2010. Saint-Malo, France,: ACM: 2010. p. 451–460.

Haque I, Pande V, Walters W. Anatomy of high-performance 2d similarity calculations. J Chem Inf Model. 2011; 51:2345–51.

Hardy H. Mendelian proportions in a mixed population. Science. 1908; 28:49–50.

Wigginton J, Cutler D, Abecasis G. A note on exact tests of hardy-weinberg equilibrium. Am J Hum Genet. 2005; 76:887–93.

Guo S, Thompson E. Performing the exact test of hardy-weinberg proportion for multiple alleles. Biometrics. 1992; 48:361–72.

Mehta C, Patel N. Algorithm 643: Fexact: a fortran subroutine for fisher’s exact test on unordered r ×c contingency tables. ACM Trans Math Softw. 1986; 12:154–61.

Clarkson D, Fan Y, Joe H. A remark on algorithm 643: Fexact: an algorithm for performing fisher’s exact test in r x c contingency tables. ACM Trans Math Softw. 1993; 19:484–8.

Requena F, Martín Ciudad N. A major improvement to the network algorithm for fisher’s exact test in 2 ×c contingency tables. J Comp Stat & Data Anal. 2006; 51:490–8.

Chang C. Standalone C/C++ Exact Statistical Test Functions. https://github.com/chrchang/stats .

Lydersen S, Fagerland M, Laake P. Recommended tests for association in 2 ×2 tables. Statist Med. 2009; 28:1159–75.

Graffelman J, Moreno V. The mid p-value in exact tests for hardy-weinberg equilibrium. Stat Appl Genet Mol Bio. 2013; 12:433–48.

Wall J, Pritchard J. Assessing the performance of the haplotype block model of linkage disequilibrium. Am J Hum Genet. 2003; 73:502–15.

Gabriel S, Schaffner S, Nguyen H, Moore J, Roy J, Blumenstiel B, et al. The structure of haplotype blocks in the human genome. Science. 2002; 296:2225–9.

Barrett J, Fry B, Maller J, Daly M. Haploview: analysis and visualization of ld and haplotype maps. Bioinformatics. 2005; 21:263–5.

Hill W. Estimation of linkage disequilibrium in randomly mating populations. Heredity. 1974; 33:229–39.

Gaunt T, Rodríguez S, Day I. Cubic exact solutions for the estimation of pairwise haplotype frequencies: implications for linkage disequilibrium analyses and a web tool ’cubex’. BMC Bioinformatics. 2007; 8:428.

Taliun D, Gamper J, Pattaro C. Efficient haplotype block recognition of very long and dense genetic sequences. BMC Bioinformatics. 2014; 15:10.

Friedman J, Hastie T, Höfling H, Tibshirani R. Pathwise coordinate optimization. Ann Appl Stat. 2007; 1:302–32.

Vattikuti S, Lee J, Chang C, Hsu S, Chow C. Applying compressed sensing to genome-wide association studies. GigaScience. 2014; 3:10.

Steiß V, Letschert T, Schäfer H, Pahl R. Permory-mpi: A program for high-speed parallel permutation testing in genome-wide association studies. Bioinformatics. 2012; 28:1168–9.

Wan X, Yang C, Yang Q, Xue H, Fan X, Tang N, et al. Boost: A fast approach to detecting gene-gene interactions in genome-wide case-control studies. Am J Hum Genet. 2010; 87:325–40.

Ueki M, Cordell H. Improved statistics for genome-wide interaction analysis. PLoS Genet. 2012; 8:1002625.

Howey R. CASSI: Genome-Wide Interaction Analysis Software. http://www.staff.ncl.ac.uk/richard.howey/cassi .

GWASSpeedup Problem Statement. http://community.topcoder.com/longcontest/?module=ViewProblemStatement&rd=15637&pm=12525 .

Adler M. Pigz: Parallel Gzip. http://zlib.net/pigz/ .

Abecasis G, Cardon L, Cookson W. A general test of association for quantitative traits in nuclear families. Am J Hum Genet. 2000; 66:279–92.

Ewens W, Li M, Spielman R. A review of family-based tests for linkage disequilibrium between a quantitative trait and a genetic marker. PLoS Genet. 2008; 4:1000180.

Su Z, Marchini J, Donnelly P. Hapgen2: Simulation of multiple disease snps. Bioinformatics. 2011; 27:2304–5.

Xu Y, Wu Y, Song C, Zhang H. Simulating realistic genomic data with rare variants. Genet Epidemiol. 2013; 37:163–72.

The 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012; 491:56–65.

Defays D. An efficient algorithm for a complete link method. Comput J. 1977; 20:364–6.

Browning B, Browning S. A fast, powerful method for detecting identity by descent. Am J Hum Genet. 2011; 88:173–82.

Browning B. Presto: rapid calculation of order statistic distributions and multiple-testing adjusted p-values via permutation for one and two-stage genetic association studies. BMC Bioinformatics. 2008; 9:309.

Loh P, Baym M, Berger B. Compressive genomics. Nat Biotechnol. 2012; 30:627–30.

Sambo F, Di Camillo B, Toffolo G, Cobelli C. Compression and fast retrieval of snp data. Bioinformatics. 2014; 30:495.

PLINK/SEQ: A Library for the Analysis of Genetic Variation Data. https://atgu.mgh.harvard.edu/plinkseq/ .