Robust Demographic Inference from Genomic and SNP Data

PLoS Genetics - Tập 9 Số 10 - Trang e1003905
Laurent Excoffier1,2, Isabelle Dupanloup1,2, Emilia Huerta‐Sánchez3, Vítor C. Sousa1,2, Matthieu Foll1,4,2
1CMPG, Institute of Ecology and Evolution, Berne, Switzerland
2Swiss Institute of Bioinformatics, Lausanne, Switzerland
3Center for Theoretical Evolutionary Genomics, Department of Integrative Biology, University of California, Berkeley, Berkeley, California, United States of America
4School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland

Tóm tắt

Từ khóa


Tài liệu tham khảo

R Nielsen, 2007, Recent and ongoing selection in the human genome, Nat Rev Genet, 8, 857, 10.1038/nrg2187

JL Kelley, 2006, Genomic signatures of positive selection in humans and the limits of outlier approaches, Genome Res, 16, 980, 10.1101/gr.5157306

R Nielsen, 2005, Genomic scans for selective sweeps using SNP data, Genome Res, 15, 1566, 10.1101/gr.4252305

MA Beaumont, 1996, Evaluating loci for use in the genetic analysis of population structure, Proceedings of the Royal Society London B, 263, 1619, 10.1098/rspb.1996.0237

AR Boyko, 2008, Assessing the evolutionary impact of amino acid mutations in the human genome, PLoS Genet, 4, e1000083, 10.1371/journal.pgen.1000083

MK Kuhner, 2000, Usefulness of Single Nucleotide Polymorphism Data for Estimating Population Parameters, Genetics, 156, 439, 10.1093/genetics/156.1.439

P Beerli, 2001, Maximum likelihood estimation of a migration matrix and effective population sizes in n subpopulations by using a coalescent approach, Proceedings of the National Academy of Sciences USA, 98, 4563, 10.1073/pnas.081068098

J Hey, 2007, Integration within the Felsenstein equation for improved Markov chain Monte Carlo methods in population genetics, Proc Natl Acad Sci U S A, 104, 2785, 10.1073/pnas.0611164104

J Hey, 2010, Isolation with migration models for more than two populations, Mol Biol Evol, 27, 905, 10.1093/molbev/msp296

C Becquet, 2007, A new approach to estimate parameters of speciation models with application to apes, Genome Res, 17, 1505, 10.1101/gr.6409707

L Naduvilezhath, 2011, Jaatha: a fast composite-likelihood approach to estimate demographic parameters, Mol Ecol, 20, 2709, 10.1111/j.1365-294X.2011.05131.x

C Leuenberger, 2010, Bayesian computation and model selection without likelihoods, Genetics, 184, 243, 10.1534/genetics.109.109058

D Wegmann, 2009, Efficient approximate Bayesian computation coupled with Markov chain Monte Carlo without likelihood, Genetics, 182, 1207, 10.1534/genetics.109.102509

MA Beaumont, 2009, Adaptive approximate Bayesian computation, Biometrika, 96, 983, 10.1093/biomet/asp052

L Excoffier, 2005, Bayesian Analysis of an Admixture Model With Mutations and Arbitrarily Linked Markers, Genetics, 169, 1727, 10.1534/genetics.104.036236

MA Beaumont, 2002, Approximate Bayesian computation in population genetics, Genetics, 162, 2025, 10.1093/genetics/162.4.2025

R Nielsen, 2000, Estimation of population parameters and recombination rates from single nucleotide polymorphisms, Genetics, 154, 931, 10.1093/genetics/154.2.931

H Chen, 2012, The joint allele frequency spectrum of multiple populations: a coalescent theory approach, Theor Popul Biol, 81, 179, 10.1016/j.tpb.2011.11.004

GT Marth, 2004, The Allele Frequency Spectrum in Genome-Wide Human Variation Data Reveals Signals of Differential Demographic History in Three Large World Populations, Genetics, 166, 351, 10.1534/genetics.166.1.351

AM Adams, 2004, Maximum-likelihood estimation of demographic parameters using the frequency spectrum of unlinked single-nucleotide polymorphisms, Genetics, 168, 1699, 10.1534/genetics.104.030171

RN Gutenkunst, 2009, Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data, PLoS genetics, 5, e1000695, 10.1371/journal.pgen.1000695

D Garrigan, 2009, Composite likelihood estimation of demographic parameters, BMC genetics, 10, 72, 10.1186/1471-2156-10-72

S Lukic, 2011, Non-equilibrium allele frequency spectra via spectral methods, Theoretical population biology, 79, 203, 10.1016/j.tpb.2011.02.003

S Lukic, 2012, Demographic inference using spectral methods on SNP data, with an analysis of the human out-of-Africa expansion, Genetics, 192, 619, 10.1534/genetics.112.141846

H Li, 2011, Inference of human population history from individual whole-genome sequences, Nature, 475, 493, 10.1038/nature10231

I Gronau, 2011, Bayesian inference of ancient human demography from individual genome sequences, Nat Genet, 43, 1031, 10.1038/ng.937

S Myers, 2008, Can one learn history from the allelic spectrum?, Theoretical population biology, 73, 342, 10.1016/j.tpb.2008.01.001

S Gravel, 2011, Demographic history and rare allele sharing among human populations, Proc Natl Acad Sci U S A, 108, 11983, 10.1073/pnas.1019276108

JA Tennessen, 2012, Evolution and functional impact of rare coding variation from deep sequencing of human exomes, Science, 337, 64, 10.1126/science.1219240

V Sousa, 2013, Understanding the origin of species with genome-scale data: modelling gene flow, Nat Rev Genet, 14, 404, 10.1038/nrg3446

X Yi, 2010, Sequencing of 50 human exomes reveals adaptation to high altitude, Science, 329, 75, 10.1126/science.1190371

R Nielsen, 2012, SNP calling, genotype calling, and sample allele frequency estimation from New-Generation Sequencing data, PloS one, 7, e37558, 10.1371/journal.pone.0037558

RM Durbin, 2010, A map of human genome variation from population-scale sequencing, Nature, 467, 1061, 10.1038/nature09534

JE Crawford, 2012, Assessing the accuracy and power of population genetic inference from low-pass next-generation sequencing data, Front Genet, 3, 66, 10.3389/fgene.2012.00066

R Nielsen, 2011, Genotype and SNP calling from next-generation sequencing data, Nat Rev Genet, 12, 443, 10.1038/nrg2986

M Lynch, 2009, Estimation of allele frequencies from high-coverage genome-sequencing projects, Genetics, 182, 295, 10.1534/genetics.109.100479

SY Kim, 2011, Estimation of allele frequency and association mapping using next-generation sequencing data, BMC Bioinformatics, 12, 231, 10.1186/1471-2105-12-231

PL Johnson, 2006, Inference of population genetic parameters in metagenomics: a clean look at messy data, Genome Res, 16, 1320, 10.1101/gr.5431206

A Wollstein, 2010, Demographic history of Oceania inferred from genome-wide data, Current biology : CB, 20, 1983, 10.1016/j.cub.2010.10.040

A Albrechtsen, 2010, Ascertainment biases in SNP chips affect measures of population divergence, Mol Biol Evol, 27, 2534, 10.1093/molbev/msq148

AG Clark, 2005, Ascertainment bias in studies of human genome-wide polymorphism, Genome Res, 15, 1496, 10.1101/gr.4107905

N Patterson, 2012, Ancient admixture in human history, Genetics, 192, 1065, 10.1534/genetics.112.145037

Lu Y, Patterson N, Zhan Y, Mallick S, Reich D (2011) Technical design document for a SNP array that is optimized for population genetics. Available: <ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="ftp://ftp.cephb.fr/hgdp_supp10/8_12_2011_Technical_Array_Design_Document.pdf" xlink:type="simple">ftp://ftp.cephb.fr/hgdp_supp10/8_12_2011_Technical_Array_Design_Document.pdf</ext-link>

R Nielsen, 2004, Reconstituting the Frequency Spectrum of Ascertained Single-Nucleotide Polymorphism Data, Genetics, 168, 2373, 10.1534/genetics.104.031039

JK Pickrell, 2012, The genetic prehistory of southern Africa, Nature communications, 3, 1143, 10.1038/ncomms2140

J Wakeley, 1997, Estimating ancestral population parameters, Genetics, 145, 847, 10.1093/genetics/145.3.847

L Excoffier, 2004, Patterns of DNA sequence diversity and genetic structure after a range expansion: lessons from the infinite-island model, Mol Ecol, 13, 853, 10.1046/j.1365-294X.2003.02004.x

NJ Fagundes, 2007, Statistical evaluation of alternative models of human evolution, Proc Natl Acad Sci U S A, 104, 17614, 10.1073/pnas.0708280104

F Zakharia, 2009, Characterizing the admixed African ancestry of African Americans, Genome Biol, 10, R141, 10.1186/gb-2009-10-12-r141

P Sjodin, 2012, Resequencing data provide no evidence for a human bottleneck in Africa during the penultimate glacial period, Mol Biol Evol, 29, 1851, 10.1093/molbev/mss061

BM Henn, 2011, Hunter-gatherer genomic diversity suggests a southern African origin for modern humans, Proc Natl Acad Sci U S A, 108, 5154, 10.1073/pnas.1017511108

H Akaike, 1974, New Look at Statistical-Model Identification, Ieee Transactions on Automatic Control, Ac19, 716, 10.1109/TAC.1974.1100705

KR Veeramah, 2012, An early divergence of KhoeSan ancestors from those of other modern humans is supported by an ABC-based analysis of autosomal resequencing data, Molecular biology and evolution, 29, 617, 10.1093/molbev/msr212

MF Hammer, 2011, Genetic evidence for archaic admixture in Africa, Proc Natl Acad Sci U S A, 108, 15123, 10.1073/pnas.1109300108

CM Schlebusch, 2012, Genomic Variation in Seven Khoe-San Groups Reveals Adaptation and Complex African History, Science, 338, 374, 10.1126/science.1227721

GJ Dimmendaal, 2008, Language Ecology and Linguistic Diversity on the African Continent, Language and Linguistics Compass, 840, 10.1111/j.1749-818X.2008.00085.x

C Ehret, 2001, Bantu expansions: Re-envisioning a central problem of early African history, International Journal of African Historical Studies, 34, 5, 10.2307/3097285

D Reich, 2010, Genetic history of an archaic hominin group from Denisova Cave in Siberia, Nature, 468, 1053, 10.1038/nature09710

M Meyer, 2012, A high-coverage genome sequence from an archaic Denisovan individual, Science, 338, 222, 10.1126/science.1224344

A Auton, 2007, Recombination rate estimation in the presence of hotspots, Genome Research, 17, 1219, 10.1101/gr.6386707

PA Jenkins, 2012, Genealogy-based methods for inference of historical recombination and gene flow and their application in Saccharomyces cerevisiae, PloS one, 7, e46947, 10.1371/journal.pone.0046947

R Nielsen, 2009, Darwinian and demographic forces affecting human protein coding genes, Genome Res, 19, 838, 10.1101/gr.088336.108

RD Hernandez, 2007, Context dependence, ancestral misidentification, and spurious signatures of natural selection, Mol Biol Evol, 24, 1792, 10.1093/molbev/msm108

C Varin, 2011, An Overview of Composite Likelihood Methods, Statistica Sinica, 21, 5

MA Beaumont, 2003, Estimation of population growth or decline in genetically monitored populations, Genetics, 164, 1139, 10.1093/genetics/164.3.1139

C Andrieu, 2009, The Pseudo-Marginal Approach for Efficient Monte Carlo Computations, Annals of Statistics, 37, 697, 10.1214/07-AOS574

A Kong, 2012, Rate of de novo mutations and the importance of father&apos;s age to disease risk, Nature, 488, 471, 10.1038/nature11396

A Scally, 2012, Revising the human mutation rate: implications for understanding human evolution, Nature reviews Genetics, 13, 745, 10.1038/nrg3295

S Li, 2012, Estimating demographic parameters from large-scale population genomic data using Approximate Bayesian Computation, BMC genetics, 13, 22, 10.1186/1471-2156-13-22

K Csillery, 2010, Approximate Bayesian Computation (ABC) in practice, Trends in ecology & evolution, 25, 410, 10.1016/j.tree.2010.04.001

JS Lopes, 2010, ABC: a useful Bayesian tool for the analysis of population data, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases, 10, 826, 10.1016/j.meegid.2009.10.010

S Aeschbacher, 2012, A novel approach for choosing summary statistics in approximate Bayesian computation, Genetics, 192, 1027, 10.1534/genetics.112.143164

MA Nunes, 2010, On optimal selection of summary statistics for approximate Bayesian computation, Statistical applications in genetics and molecular biology, 9, Article34, 10.2202/1544-6115.1576

VC Sousa, 2009, Approximate bayesian computation without summary statistics: the case of admixture, Genetics, 181, 1507, 10.1534/genetics.108.098129

P Beerli, 2004, Effect of unsampled populations on the estimation of population sizes and migration rates between sampled populations, Mol Ecol, 13, 827, 10.1111/j.1365-294X.2004.02101.x

M Slatkin, 2005, Seeing ghosts: the effect of unsampled populations on migration rates estimated for sampled populations, Mol Ecol, 14, 67, 10.1111/j.1365-294X.2004.02393.x

A Gelman, 1996, Posterior predictive assessment of model fitness via realized discrepancies, Statistica Sinica, 6, 733

Box GEP, Draper NR (1987) Empirical model-building and response surfaces. New York; Chichester etc.: J. Wiley. XIV, 669 pp.

XL Meng, 1993, Maximum likelihood estimation via the ECM algorithm: A general framework, Biometrika, 80, 267, 10.1093/biomet/80.2.267

Brent RP (1973) Algorithms for Minimization without Derivatives. Englewood Cliffs, NJ: Prentice-Hall.

Press WH, Teukolsky SA, Vetterling WT, Flannery BP (2007) Numerical Recipes in C++: The Art of Scientific Computing. Cambridge: Cambridge University Press. 1256 p.

L Excoffier, 2011, fastsimcoal: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios, Bioinformatics, 27, 1332, 10.1093/bioinformatics/btr124

R Drmanac, 2010, Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays, Science, 327, 78, 10.1126/science.1181498

B O'Fallon, 2013, Purifying selection causes widespread distortions of genealogical structure on the human×chromosome, Genetics, 194, 485, 10.1534/genetics.113.152074

E Birney, 2004, Ensembl 2004, Nucleic acids research, 32, D468, 10.1093/nar/gkh038

Karolchik D, Hinrichs AS, Kent WJ (2012) The UCSC Genome Browser. Current protocols in bioinformatics/editoral board, Andreas D Baxevanis [et al] Chapter 1: Unit1 4.

MA Beaumont, 2004, Recent developments in genetic data analysis: what can they tell us about human demographic history?, Heredity, 92, 365, 10.1038/sj.hdy.6800447

J Wakeley, 1999, Nonequilibrium migration in human history, Genetics, 153, 1863, 10.1093/genetics/153.4.1863

JB Johnson, 2004, Model selection in ecology and evolution, Trends in ecology & evolution, 19, 101, 10.1016/j.tree.2003.10.013

L Zhu, 2005, A composite-likelihood approach for detecting directional selection from DNA sequence data, Genetics, 170, 1411, 10.1534/genetics.104.035097

C Varin, 2005, A note on composite likelihood inference and model selection, Biometrika, 92, 519, 10.1093/biomet/92.3.519