The program structure does not reliably recover the correct population structure when sampling is uneven: subsampling and new estimators alleviate the problem

Molecular Ecology Resources - Tập 16 Số 3 - Trang 608-627 - 2016
Sébastien J. Puechmaille1
1School of Biology and Environmental Sciences, University College Dublin, Belfield, Dublin 4, Ireland

Tóm tắt

AbstractInferences of population structure and more precisely the identification of genetically homogeneous groups of individuals are essential to the fields of ecology, evolutionary biology and conservation biology. Such population structure inferences are routinely investigated via the program structure implementing a Bayesian algorithm to identify groups of individuals at Hardy–Weinberg and linkage equilibrium. While the method is performing relatively well under various population models with even sampling between subpopulations, the robustness of the method to uneven sample size between subpopulations and/or hierarchical levels of population structure has not yet been tested despite being commonly encountered in empirical data sets. In this study, I used simulated and empirical microsatellite data sets to investigate the impact of uneven sample size between subpopulations and/or hierarchical levels of population structure on the detected population structure. The results demonstrated that uneven sampling often leads to wrong inferences on hierarchical structure and downward‐biased estimates of the true number of subpopulations. Distinct subpopulations with reduced sampling tended to be merged together, while at the same time, individuals from extensively sampled subpopulations were generally split, despite belonging to the same panmictic population. Four new supervised methods to detect the number of clusters were developed and tested as part of this study and were found to outperform the existing methods using both evenly and unevenly sampled data sets. Additionally, a subsampling strategy aiming to reduce sampling unevenness between subpopulations is presented and tested. These results altogether demonstrate that when sampling evenness is accounted for, the detection of the correct population structure is greatly improved.

Từ khóa


Tài liệu tham khảo

10.1111/j.1755-0998.2008.02355.x

10.1093/jhered/92.3.301

10.1371/journal.pone.0070651

10.1111/j.1471-8286.2007.01769.x

10.1186/1471-2105-9-539

10.1017/S001667230100502X

10.1093/jhered/ess038

10.1111/j.1365-294X.2005.02553.x

10.1111/j.1365-294X.2005.02553.x

10.1111/j.1755-0998.2010.02868.x

10.1111/j.1365-2664.2008.01606.x

10.1534/genetics.113.160572

10.1046/j.1365-294x.2001.01191.x

10.1111/j.1471-8286.2005.01031.x

10.1371/journal.pone.0045170

10.1111/j.1755-0998.2009.02591.x

10.1093/bioinformatics/btn129

10.1038/hdy.2008.34

10.1038/hdy.2008.130

10.1186/1471-2156-11-94

10.1038/hdy.2008.136

10.1038/hdy.2010.95

10.1007/s10592-005-9098-1

10.1186/1471-2105-10-S1-S73

10.1371/journal.pgen.1000686

10.2135/cropsci2012.04.0215

10.1007/s00122-011-1576-x

10.1098/rspb.2014.2230

Pritchard JK, 2004, Documentation for Structure Software: Version 2, 33

Pritchard JK, 2000, Inference of population structure using multilocus genotype data, Genetics, 155, 945, 10.1093/genetics/155.2.945

Pritchard JK, 2010, Documentation for Structure Software: Version 2.3, 33

10.1038/ncomms1582

R Development Core Team, 2014, R: A Language and Environment for Statistical Computing

10.1111/j.1755-0998.2012.03156.x

10.1017/S0016672314000068

10.1007/s10592-008-9622-1

10.1038/ng.2398