Robustness of linear mixed‐effects models to violations of distributional assumptions

Methods in Ecology and Evolution - Tập 11 Số 9 - Trang 1141-1152 - 2020
Holger Schielzeth1, Niels J. Dingemanse2, Shinichi Nakagawa3, David F. Westneat4, Hassen Allegue5, Céline Teplitsky6, Denis Réale5, Ned A. Dochtermann7, László Zsolt Garamszegi8,9, Yimen G. Araya‐Ajoy10
1Institute of Ecology and Evolution, Friedrich Schiller University, Jena, Germany
2Behavioural Ecology, Department of Biology, Ludwig-Maximilians University of Munich, Planegg-Martinsried, Germany
3Evolution & Ecology Research Centre, and School of Biological, Earth and Environmental Sciences University of New South Wales Sydney NSW Australia.
4Department of Biology, University of Kentucky, Lexington, KY, USA
5Département des sciences biologiques, Université du Québec à Montréal, Montreal, QC, Canada
6Centre d’Ecologie Fonctionnelle et Evolutive, CNRS, Montpellier, France
7Department of Biological Sciences, North Dakota State University, Fargo, ND, USA
8Centre for Ecological Research, Institute of Ecology and Botany, Vácrátót, Hungary
9MTA‐ELTE Theoretical Biology and Evolutionary Ecology Research Group Department of Plant Systematics, Ecology and Theoretical Biology Eötvös Loránd University Budapest Hungary
10Centre for Biodiversity Dynamics (CBD), Department of Biology, Norwegian University of Science and Technology (NTNU), Trondheim, Norway

Tóm tắt

Abstract

Linear mixed‐effects models are powerful tools for analysing complex datasets with repeated or clustered observations, a common data structure in ecology and evolution. Mixed‐effects models involve complex fitting procedures and make several assumptions, in particular about the distribution of residual and random effects. Violations of these assumptions are common in real datasets, yet it is not always clear how much these violations matter to accurate and unbiased estimation.

Here we address the consequences of violations in distributional assumptions and the impact of missing random effect components on model estimates. In particular, we evaluate the effects of skewed, bimodal and heteroscedastic random effect and residual variances, of missing random effect terms and of correlated fixed effect predictors. We focus on bias and prediction error on estimates of fixed and random effects.

Model estimates were usually robust to violations of assumptions, with the exception of slight upward biases in estimates of random effect variance if the generating distribution was bimodal but was modelled by Gaussian error distributions. Further, estimates for (random effect) components that violated distributional assumptions became less precise but remained unbiased. However, this particular problem did not affect other parameters of the model. The same pattern was found for strongly correlated fixed effects, which led to imprecise, but unbiased estimates, with uncertainty estimates reflecting imprecision.

Unmodelled sources of random effect variance had predictable effects on variance component estimates. The pattern is best viewed as a cascade of hierarchical grouping factors. Variances trickle down the hierarchy such that missing higher‐level random effect variances pool at lower levels and missing lower‐level and crossed random effect variances manifest as residual variance.

Overall, our results show remarkable robustness of mixed‐effects models that should allow researchers to use mixed‐effects models even if the distributional assumptions are objectively violated. However, this does not free researchers from careful evaluation of the model. Estimates that are based on data that show clear violations of key assumptions should be treated with caution because individual datasets might give highly imprecise estimates, even if they will be unbiased on average across datasets.

Từ khóa


Tài liệu tham khảo

10.1111/2041-210X.12659

10.1198/tast.2010.09244

10.3758/s13428-012-0306-x

10.18637/jss.v067.i01

Becker M. &Klößner S.(2017).PearsonDS: Pearson distribution system. R package version 1.1. Retrieved fromhttps://cran.r‐project.org/web/packages/PearsonDS

10.2307/j.ctvcm4g37

10.1016/j.tree.2008.10.008

10.2307/2982063

10.1016/S0169-5347(00)89117-X

10.1002/sim.3775

10.1007/s00265-011-1254-7

10.1111/2041-210X.12281

10.1002/sim.1974

10.1007/s00265-010-1045-6

Gelman A., 2007, Data analysis using regression and multilevel/hierarchical models

10.1007/s11135-014-0060-5

10.1002/sim.4293

10.1111/j.1467-985X.2005.00391.x

10.7717/peerj.4794

10.1093/biomet/88.4.973

10.1086/658408

10.1016/j.csda.2006.05.021

10.2307/2533558

10.1016/j.csda.2008.12.013

10.1214/088342304000000305

10.1002/wics.1238

10.18637/jss.v056.i05

10.1046/j.0039-0402.2003.00252.x

10.1214/11-STS361

10.3998/ptpbio.16039257.0010.003

10.1093/acprof:oso/9780199672547.003.0005

10.1016/j.tree.2008.06.014

10.1111/j.1469-185X.2010.00141.x

10.1017/CBO9780511806384

R Core Team, 2019, R: A language and environment for statistical computing

10.2307/1403146

10.1111/1467-9868.00392

10.1214/aos/1176346785

10.1002/bimj.200610341

Schielzeth H., 2020, Code for: Robustness of linear mixed‐effects models to violations of distributional assumptions, Zenodo

10.1093/beheco/arn145

10.1016/S0378-3758(02)00303-8

Snijders T. A. B., 2011, Multilevel analysis: An introduction to basic and advanced multilevel modelling

10.1080/01621459.1994.10476806

10.2307/2347496

10.1080/01621459.1996.10476679

10.1016/S0167-9473(96)00047-3

10.1515/sagmb-2013-0066

10.1111/brv.12131

Zare K., 2011, Diagnostic measures for linear mixed measurement error models, Sort‐Statistics and Operations Research Transactions, 35, 125

10.1111/j.2041-210X.2009.00001.x

10.1007/978-0-387-87458-6