Inference of Population Structure Using Multilocus Genotype Data: Linked Loci and Correlated Allele Frequencies

Genetics - Tập 164 Số 4 - Trang 1567-1587 - 2003
Daniel Falush1, Matthew Stephens2, Jonathan K. Pritchard3
1Department of Molecular Biology, Max-Planck Institut für Infektionsbiologie, 10117 Berlin, Germany
2Department of Statistics, University of Washington, Seattle, Washington, 98195
3Department of Human Genetics, University of Chicago, Chicago, Illinois 60637

Tóm tắt

Abstract We describe extensions to the method of Pritchard et al. for inferring population structure from multilocus genotype data. Most importantly, we develop methods that allow for linkage between loci. The new model accounts for the correlations between linked loci that arise in admixed populations (“admixture linkage disequilibium”). This modification has several advantages, allowing (1) detection of admixture events farther back into the past, (2) inference of the population of origin of chromosomal regions, and (3) more accurate estimates of statistical uncertainty when linked loci are used. It is also of potential use for admixture mapping. In addition, we describe a new prior model for the allele frequencies within each population, which allows identification of subtle population subdivisions that were not detectable using the existing method. We present results applying the new methods to study admixture in African-Americans, recombination in Helicobacter pylori, and drift in populations of Drosophila melanogaster. The methods are implemented in a program, structure, version 2.0, which is available at http://pritch.bsd.uchicago.edu.

Từ khóa


Tài liệu tham khảo

Agis, 2001, Microsatellite variation in natural Drosophila melanogaster populations from New South Wales (Australia) and Tasmania, Mol. Ecol., 10, 1197, 10.1046/j.1365-294X.2001.01271.x

Anderson E C , 2001  Monte Carlo methods for inference in population genetic models. Ph.D. Thesis, University of Washington, Seattle.

Anderson, 2002, A model-based method for identifying species hybrids using multilocus genetic data, Genetics, 160, 1217, 10.1093/genetics/160.3.1217

Balding, 1997, Significant genetic correlations among Caucasians at forensic DNA loci, Heredity, 78, 583, 10.1038/hdy.1997.97

Barton, 1989, Adaptation, speciation and hybrid zones, Nature, 341, 497, 10.1038/341497a0

Beaumont, 2001, Genetic diversity and introgression in the Scottish wildcat, Mol. Ecol., 10, 319, 10.1046/j.1365-294x.2001.01196.x

Bertorelle, 1998, Inferring admixture proportions from molecular data, Mol. Biol. Evol., 15, 1298, 10.1093/oxfordjournals.molbev.a025858

Broman, 1998, Comprehensive human genetic maps: individual and sex-specific variation in recombination, Am. J. Hum. Genet., 63, 861, 10.1086/302011

Chakraborty, 1988, Admixture as a tool for finding linked genes and detecting that difference from allelic association between loci, Proc. Natl. Acad. Sci. USA, 85, 9119, 10.1073/pnas.85.23.9119

Chikhi, 2001, Estimation of admixture proportions: a likelihood-based approach using Markov chain Monte Carlo, Genetics, 158, 1347, 10.1093/genetics/158.3.1347

Cooper, 2002, A genome-wide scan among Nigerians linking blood pressure to regions on chromosomes 2, 3, and 19, Hypertension, 40, 629, 10.1161/01.HYP.0000035708.02789.39

Daly, 2001, High-resolution haplotype structure in the human genome, Nat. Genet., 29, 229, 10.1038/ng1001-229

Dawson, 2001, A Bayesian approach to the identification of panmictic populations and the assignment of individuals, Genet. Res., 78, 59, 10.1017/S001667230100502X

Erosheva E A , 2002  Grade of membership and latent structure models with application to disability survey data. Ph.D. Thesis, Department of Statistics, Carnegie Mellon University, Pittsburgh.

Excoffier, 2001, Analysis of population subdivision, Handbook of Statistical Genetics, 271

Falush, 2001, Recombination and mutation during long-term gastric colonization by Helicobacter pylori: estimates of clock rates, recombination size, and minimal age, Proc. Natl. Acad. Sci. USA, 98, 1505, 10.1073/pnas.251396098

Falush, 2003, Traces of human migrations in Helicobacter pylori populations, Science, 299, 1582, 10.1126/science.1080857

Gilks, 1996, Markov Chain Monte Carlo in Practice

Guglielmino, 1990, Uralic genes in Europe, Am. J. Phys. Anthropol., 83, 57, 10.1002/ajpa.1330830107

Knowler, 1988, Gm3;5,13,14 and type 2 diabetes mellitus: an association in American Indians with genetic admixture, Am. J. Hum. Genet., 43, 520

Kong, 2002, A high-resolution recombination map of the human genome, Nat. Genet., 31, 241, 10.1038/ng917

Kumar, 2001, MEGA2: molecular evolutionary genetics analysis software, Bioinformatics, 17, 1244, 10.1093/bioinformatics/17.12.1244

Long, 1991, The genetic structure of admixed population, Genetics, 127, 417, 10.1093/genetics/127.2.417

Marchini, 2002, Discussion on statistical modelling and genetic data, J. R. Stat. Soc. B, 64, 740

McKeigue, 1998, Mapping genes that underlie ethnic differences in disease risk: methods for detecting linkage in admixed populations, by conditioning on parental admixture, Am. J. Hum. Genet., 63, 241, 10.1086/301908

McKeigue, 2000, Estimation of admixture and detection of linkage in admixed populations by a Bayesian approach: application to African-American populations, Ann. Hum. Genet., 64, 171, 10.1046/j.1469-1809.2000.6420171.x

Nei, 1979, Mathematical model for studying genetic variation in terms of restriction endonucleases, Proc. Natl. Acad. Sci. USA, 76, 5269, 10.1073/pnas.76.10.5269

Nicholson, 2002, Assessing population differentiation and isolation from single nucleotide polymorphism data, J. R. Stat. Soc. B, 64, 695, 10.1111/1467-9868.00357

Parra, 1998, Estimating African American admixture proportions by use of population-specific alleles, Am. J. Hum. Genet., 63, 1839, 10.1086/302148

Pfaff, 2001, Population structure in admixed populations: effect of admixture dynamics on the pattern of linkage disequilibrium, Am. J. Hum. Genet., 68, 198, 10.1086/316935

Pritchard, 2000, Inference of population structure using multilocus genotype data, Genetics, 155, 945, 10.1093/genetics/155.2.945

Rabiner, 1989, A tutorial on hidden Markov models and selected applications in speech recognition, Proc. IEEE, 77, 257, 10.1109/5.18626

Rieseberg, 1999, Hybrid zones and the genetic architecture of a barrier to gene flow between two sunflower species, Genetics, 152, 713, 10.1093/genetics/152.2.713

Rosenberg, 2002, Genetic structure of human populations, Science, 298, 2381, 10.1126/science.1078311

Satten, 2001, Accounting for unmeasured population structure in case-control studies of genetic association using a novel latent-class model, Am. J. Hum. Genet., 68, 466, 10.1086/318195

Sillanpää, 2001, Bayesian association mapping for quantitative traits in a mixture of two populations, Genet. Epidemiol., 21, 692, 10.1002/gepi.2001.21.s1.s692

Sites, 1995, The genetic-structure of a hybrid zone between 2 chromosome races of the Sceloporus grammicus complex (Sauria, Phrynosomatidae) in central Mexico, Evolution, 49, 9, 10.1111/j.1558-5646.1995.tb05955.x

Stephens, 1994, Mapping admixture linkage disequilibrium in human populations: limits and guidelines, Am. J. Hum. Genet., 55, 809

Stephens, 2001, A new statistical method for haplotype reconstruction from population data, Am. J. Hum. Genet., 68, 978, 10.1086/319501

Thiel, 2003, A genome wide linkage analysis investigating the determinants of blood pressure in Caucasians and African Americans, Am. J. Hypertens., 16, 151, 10.1016/S0895-7061(02)03246-6

Thompson, 1973, The Icelandic admixture problem, Ann. Hum. Genet., 37, 69, 10.1111/j.1469-1809.1973.tb01815.x

Thornsberry, 2001, Dwarf8 polymorphisms associate with variation with flowering time, Nat. Genet., 28, 286, 10.1038/90135

Wright, 1951, The genetical structure of populations, Ann. Eugen., 15, 323, 10.1111/j.1469-1809.1949.tb02451.x