Fast model-based estimation of ancestry in unrelated individuals

Genome Research - Tập 19 Số 9 - Trang 1655-1664 - 2009

David H. Alexander¹, John Novembre², Kenneth Lange³

¹Department of Biomathematics, University of California at Los Angeles, Los Angeles, California 90095, USA

²Department of Ecology and Evolutionary Biology, University of California at Los Angeles, Los Angeles, California 90095, USA;

³Department of Human Genetics and Department of Statistics, University of California at Los Angeles, Los Angeles, California 90095, USA

Tóm tắt

Population stratification has long been recognized as a confounding factor in genetic association studies. Estimated ancestries, derived from multi-locus genotype data, can be used to perform a statistical correction for population stratification. One popular technique for estimation of ancestry is the model-based approach embodied by the widely applied program structure. Another approach, implemented in the program EIGENSTRAT, relies on Principal Component Analysis rather than model-based estimation and does not directly deliver admixture fractions. EIGENSTRAT has gained in popularity in part owing to its remarkable speed in comparison to structure. We present a new algorithm and a program, ADMIXTURE, for model-based estimation of ancestry in unrelated individuals. ADMIXTURE adopts the likelihood model embedded in structure. However, ADMIXTURE runs considerably faster, solving problems in minutes that take structure hours. In many of our experiments, we have found that ADMIXTURE is almost as fast as EIGENSTRAT. The runtime improvements of ADMIXTURE rely on a fast block relaxation scheme using sequential quadratic programming for block updates, coupled with a novel quasi-Newton acceleration of convergence. Our algorithm also runs faster and with greater accuracy than the implementation of an Expectation-Maximization (EM) algorithm incorporated in the program FRAPPE. Our simulations show that ADMIXTURE's maximum likelihood estimates of the underlying admixture coefficients and ancestral allele frequencies are as accurate as structure's Bayesian estimates. On real-world data sets, ADMIXTURE's estimates are directly comparable to those from structure and EIGENSTRAT. Taken together, our results show that ADMIXTURE's computational speed opens up the possibility of using a much larger set of markers in model-based ancestry estimation and that its estimates are suitable for use in correcting for population stratification in association studies.

Từ khóa

Tài liệu tham khảo

de Leeuw J (1994) in Information systems and data analysis, Block relaxation algorithms in statistics, ed Bock H (Springer Verlag, New York).

Dempster, 1977, Maximum likelihood from incomplete data via the EM algorithm, J R Stat Soc Ser B Methodol, 39, 1

Efron B Tibshirani R (1993) An introduction to the bootstrap. (CRC Press, Boca Raton, FL).

Falush, 2003, Inference of population structure using multi-locus genotype data, linked loci, and correlated allele frequencies, Genetics, 164, 1567, 10.1093/genetics/164.4.1567

10.1038/nature04226

10.1038/nature06742

10.2307/2290716

Jennrich R Sampson P (1978) in Proceedings of the Eleventh Annual Symposium on the Interface, Some problems faced in making a variance component algorithm into a general mixed model program, eds Gallant A Gerig T (Institute of Statistics, North Carolina State University), pp 56â63.

Knowler, 1988, Gm3;5,13,14 and type 2 diabetes mellitus: An association in American Indians with genetic admixture, Am J Hum Genet, 43, 520

10.1214/aos/1176347265

Lange, 1995, A Quasi-Newton acceleration of the EM algorithm, Statist Sinica, 5, 1

10.1111/j.1469-1809.1969.tb01625.x

10.1126/science.1153717

10.1038/ng1337

Mitchell, 2004, The New York Cancer Project: Rationale, organization, design, and baseline characteristics, J Urban Health, 81, 301, 10.1093/jurban/jth116

Nocedal J Wright SJ (2000) Numerical optimization (Springer), (New York), 2nd ed.

10.1038/ng.139

10.1038/nature07331

10.1086/420871

10.1371/journal.pgen.0020190

10.1038/ng1847

10.1371/journal.pgen.0030236

10.1006/tpbi.2001.1543

Pritchard, 2000, Inference of population structure using multilocus genotype data, Genetics, 155, 945, 10.1093/genetics/155.2.945

Pritchard JK Wen X Falush D (2007) Documentation for structure software: Version 2.2. Tech. rep (Department of Human Genetics, University of Chicago).

10.1101/gr.072751.107

10.1016/j.ajhg.2007.09.022

Shao J Tu D (1995) The jackknife and bootstrap. (Springer, New York).

10.1002/gepi.20064

10.1086/504302

10.1111/j.1467-9469.2007.00585.x

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA