Dsuite ‐ Fast D‐statistics and related admixture evidence from VCF files

Molecular Ecology Resources - Tập 21 Số 2 - Trang 584-595 - 2021
Milan Malinsky1, Michael Matschiner2,3, Hannes Svardal4,5
1Zoological Institute, University of Basel, Basel, Switzerland
2Department of Biosciences, University of Oslo, Oslo, Norway
3Department of Paleontology and Museum University of Zurich Zurich Switzerland
4Department of Biology, University of Antwerp, Antwerp, Belgium
5Naturalis Biodiversity Center, Leiden, The Netherlands

Tóm tắt

AbstractPatterson's D, also known as the ABBA‐BABA statistic, and related statistics such as the f4‐ratio, are commonly used to assess evidence of gene flow between populations or closely related species. Currently available implementations often require custom file formats, implement only small subsets of the available statistics, and are impractical to evaluate all gene flow hypotheses across data sets with many populations or species due to computational inefficiencies. Here, we present a new software package Dsuite, an efficient implementation allowing genome scale calculations of the D and f4‐ratio statistics across all combinations of tens or hundreds of populations or species directly from a variant call format (VCF) file. Our program also implements statistics suited for application to genomic windows, providing evidence of whether introgression is confined to specific loci, and it can also aid in interpretation of a system of f4‐ratio results with the use of the “f‐branch” method. Dsuite is available at https://github.com/millanek/Dsuite, is straightforward to use, substantially more computationally efficient than comparable programs, and provides a convenient suite of tools and statistics, including some not previously available in any software package. Thus, Dsuite facilitates the assessment of evidence for gene flow, especially across larger genomic data sets.

Từ khóa


Tài liệu tham khảo

10.1101/gr.094052.109

10.1038/nrg.2015.28

10.1093/genetics/162.4.2025

10.1093/sysbio/syy023

10.1086/521987

10.1093/bioinformatics/btr330

10.1038/ng.806

10.1093/molbev/msr048

10.1073/pnas.1200567109

10.1371/journal.pgen.1003905

10.1126/science.1258524

10.1126/science.1188021

10.1371/journal.pgen.1000695

10.1038/nature11041

10.1093/molbev/msp296

10.1093/molbev/msz038

10.1080/01621459.2019.1635482

10.1371/journal.pcbi.1004842

10.1101/414201

10.1093/bioinformatics/btr509

10.1126/science.aac9927

Malinsky M. Matschiner M. &Svardal H.(2020).dsuite‐ fast D‐statistics and related admixture evidence from VCF files.Dryad Dataset.https://doi.org/10.5061/dryad.tdz08kpxt

10.1038/s41559‐018‐0717‐x

10.1101/gr.159426.113

10.1093/molbev/msu269

10.1007/s12686‐019‐01087‐x

10.1038/nrg2986

Ortiz E. M.(2019).vcf2phylipv2.0: convert a VCF matrix into several matrix formats for phylogenetic analysis.https://doi.org/10.5281/zenodo.2540861

10.1534/genetics.112.145037

10.1371/journal.pgen.0020190

10.1093/sysbio/syv023

10.1186/s12859‐019‐2747‐z

10.1093/molbev/msu136

10.1371/journal.pgen.1002967

10.1111/j.1365-294X.2004.02396.x

10.1534/g3.117.300192

10.1093/molbev/msz294

10.1016/j.gde.2017.08.010

10.1126/sciadv.1501714