Inference of Population Structure Using Multilocus Genotype Data

Genetics - Tập 155 Số 2 - Trang 945-959 - 2000
Jonathan K. Pritchard1, Matthew Stephens1, Peter Donnelly1
1Department of Statistics, University of Oxford, Oxford OX1 3TG, United Kingdom

Tóm tắt

AbstractWe describe a model-based clustering method for using multilocus genotype data to infer population structure and assign individuals to populations. We assume a model in which there are K populations (where K may be unknown), each of which is characterized by a set of allele frequencies at each locus. Individuals in the sample are assigned (probabilistically) to populations, or jointly to two or more populations if their genotypes indicate that they are admixed. Our model does not assume a particular mutation process, and it can be applied to most of the commonly used genetic markers, provided that they are not closely linked. Applications of our method include demonstrating the presence of population structure, assigning individuals to populations, studying hybrid zones, and identifying migrants and admixed individuals. We show that the method can produce highly accurate assignments using modest numbers of loci—e.g., seven microsatellite loci in an example using genotype data from an endangered bird species. The software used for this article is available from http://www.stats.ox.ac.uk/~pritch/home.html.

Từ khóa


Tài liệu tham khảo

Balding, 1994, DNA profile match probability calculations: how to allow for population stratification, relatedness, database selection and single bands, Forensic Sci. Int., 64, 125, 10.1016/0379-0738(94)90222-4

Balding, 1995, A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity, Genetica, 96, 3, 10.1007/BF01441146

Bowcock, 1994, High resolution of human evolutionary trees with polymorphic microsatellites, Nature, 368, 455, 10.1038/368455a0

Cavalli-Sforza, 1994, The History and Geography of Human Genes

Chib, 1995, Marginal likelihood from the Gibbs output, J. Am. Stat. Assoc., 90, 1313, 10.1080/01621459.1995.10476635

Chib, 1995, Understanding the Metropolis-Hastings algorithm, Am. Stat., 49, 327, 10.1080/00031305.1995.10476177

Davies, 1999, Determining the source of individuals: multilocus genotyping in nonequilibrium population genetics, TREE, 14, 17

DiCiccio, 1997, Computing Bayes factors by posterior simulation and asymptotic approximations, J. Am. Stat. Assoc., 92, 903, 10.1080/01621459.1997.10474045

Ewens, 1995, The transmission/disequilibrium test: history, subdivision, and admixture, Am. J. Hum. Genet., 57, 455

Felsenstein, 1993, PHYLIP (phylogeny inference package) version 3.5c. Technical report

Foreman, 1997, Bayesian analysis of DNA profiling data in forensic identification applications, J. R. Stat. Soc. A, 160, 429, 10.1111/j.1467-985X.1997.00074.x

Galbusera, 2000, Effective population size and gene flow in the globally, critically endangered Taita thrush, Turdus helleri, Conserv. Genet.

Gilks, 1996, Introducing Markov chain Monte Carlo, Markov Chain Monte Carlo in Practice, 1

Gilks, 1996, Markov Chain Monte Carlo in Practice

Goldstein, 1997, Launching microsatellites: a review of mutation processes and methods of phylogenetic inference, J. Hered., 88, 335, 10.1093/oxfordjournals.jhered.a023114

Green, 1995, Reversible jump Markov chain Monte Carlo computation and Bayesian model determination, Biometrika, 82, 711, 10.1093/biomet/82.4.711

Hudson, 1990, Gene genealogies and the coalescent process, Oxford Surveys in Evolutionary Biology, 1

Jorde, 1995, Origins and affinities of modern humans: a comparison of mitochondrial and nuclear genetic data, Am. J. Hum. Genet., 57, 523

Mountain, 1997, Multilocus genotypes, a tree of individuals, and human evolutionary history, Am. J. Hum. Genet., 61, 705, 10.1086/515510

Paetkau, 1995, Microsatellite analysis of population structure in Canadian polar bears, Mol. Ecol., 4, 347, 10.1111/j.1365-294X.1995.tb00227.x

Parra, 1998, Estimating African American admixture proportions by use of population-specific alleles, Am. J. Hum. Genet., 63, 1839, 10.1086/302148

Pritchard, 1999, Use of unlinked genetic markers to detect population stratification in association studies, Am. J. Hum. Genet., 65, 220, 10.1086/302449

Raftery, 1996, Hypothesis testing and model selection, Markov Chain Monte Carlo in Practice, 163

Rannala, 1997, Detecting immigration by using multilocus genotypes, Proc. Natl. Acad. Sci. USA, 94, 9197, 10.1073/pnas.94.17.9197

Richardson, 1997, On Bayesian analysis of mixtures with an unknown number of components, J. R. Stat. Soc. Ser. B, 59, 731, 10.1111/1467-9868.00095

Roeder, 1998, Measuring heterogeneity in forensic databases using hierarchical Bayes models, Biometrika, 85, 269, 10.1093/biomet/85.2.269

Smouse, 1990, A genetic mixture analysis for use with incomplete source population-data, Can. J. Fish. Aquat. Sci., 47, 620, 10.1139/f90-070

Spiegelhalter D J , BestN G, CarlinB P, 1999  Bayesian deviance, the effective number of parameters, and the comparison of arbitrarily complex models. Available from http://www.mrc-bsu.cam.ac.uk/publications/preslid.shtml.

Stephens, 2000, Bayesian analysis of mixtures with an unknown number of components—an alternative to reversible jump methods, Ann. Stat., 10.1214/aos/1016120364

Stephens, 2000, Dealing with label-switching in mixture models, J. R. Stat. Soc. Ser. B, 10.1111/1467-9868.00265