A brief introduction to mixed effects modelling and multi-model inference in ecology

PeerJ - Tập 6 - Trang e4794
Xavier A. Harrison1, Lynda Donaldson2,3, Maria Correa-Cano2, Julian Evans4,5, David N. Fisher4,6, Cecily Goodwin2, Beth Robinson2,7, David J. Hodgson4, Richard Inger4,2
1Institute of Zoology, Zoological Society of London, London, UK
2Environment and Sustainability Institute, University of Exeter, Penryn, UK
3Wildfowl and Wetlands Trust, Slimbridge, Gloucestershire, UK
4Centre for Ecology and Conservation, University of Exeter, Penryn, UK
5Department of Biology, University of Ottawa, Ottawa, ON, Canada
6Department of Integrative Biology, University of Guelph, Guelph, ON, Canada
7WildTeam Conservation, Padstow, UK

Tóm tắt

The use of linear mixed effects models (LMMs) is increasingly common in the analysis of biological data. Whilst LMMs offer a flexible approach to modelling a broad range of data types, ecological data are often complex and require complex model structures, and the fitting and interpretation of such models is not always straightforward. The ability to achieve robust biological inference requires that practitioners know how and when to apply these tools. Here, we provide a general overview of current methods for the application of LMMs to biological data, and highlight the typical pitfalls that can be encountered in the statistical modelling process. We tackle several issues regarding methods of model selection, with particular reference to the use of information theory and multi-model inference in ecology. We offer practical solutions and direct the reader to key references that provide further technical detail for those seeking a deeper understanding. This overview should serve as a widely accessible code of best practice for applying LMMs to complex biological problems and model structures, and in doing so improve the robustness of conclusions drawn from studies investigating ecological and evolutionary questions.

Từ khóa


Tài liệu tham khảo

Aarts, 2015, Multilevel analysis quantifies variation in the experimental effect while optimizing power and preventing false positives, BMC Neuroscience, 16, 94, 10.1186/s12868-015-0228-5

Allegue, 2017, Statistical Quantification of Individual Differences (SQuID): an educational and statistical tool for understanding multilevel phenotypic data in linear mixed models, Methods in Ecology and Evolution, 8, 257, 10.1111/2041-210x.12659

Arnold, 2010, Uninformative parameters and model selection using Akaike’s information criterion, Journal of Wildlife Management, 74, 1175, 10.1111/j.1937-2817.2010.tb01236.x

Austin, 2002, Spatial prediction of species distribution: an interface between ecological theory and statistical modelling, Ecological Modelling, 157, 101, 10.1016/s0304-3800(02)00205-3

Barker, 2015, Truth, models, model sets, AIC, and multimodel inference: a Bayesian perspective, Journal of Wildlife Management, 79, 730, 10.1002/jwmg.890

Barr, 2013, Random effects structure for confirmatory hypothesis testing: keep it maximal, Journal of Memory and Language, 68, 255, 10.1016/j.jml.2012.11.001

Bartoń, 2016, MuMIn: multi-model inference

Bates, 2015a, Parsimonious mixed models

Bates, 2015b, Fitting linear mixed-effects models using lme4, Journal of Statistical Software, 67, 1, 10.18637/jss.v067.i01

Bolker, 2009, Generalized linear mixed models: a practical guide for ecology and evolution, Trends in Ecology & Evolution, 24, 127, 10.1016/j.tree.2008.10.008

Breslow, 1993, Approximate inference in generalized linear mixed models, Journal of the American Statistical Association, 88, 9, 10.1080/01621459.1993.10594284

Brewer, 2016, The relative performance of AIC, AICC and BIC in the presence of unobserved heterogeneity, Methods in Ecology and Evolution, 7, 679, 10.1111/2041-210x.12541

Burnham, 2002, Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach, 2

Burnham, 2004, Multimodel inference: understanding AIC and BIC in model selection, Sociological Methods & Research, 33, 261, 10.1177/0049124104268644

Burnham, 2011, AIC model selection and multimodel inference in behavioral ecology: some background, observations, and comparisons, Behavioral Ecology and Sociobiology, 65, 23, 10.1007/s00265-010-1029-6

Cade, 2015, Model averaging and muddled multimodel inferences, Ecology, 96, 2370, 10.1890/14-1639.1

Chatfield, 1995, Model uncertainty, data mining and statistical inference (with discussion), Journal of the Royal Statistical Society. Series A (Statistics in Society), 158, 419, 10.2307/2983440

Cox, 1989, The Analysis of Binary Data, 2

Crawley, 2013, The R Book, 2

Dochtermann, 2011, Developing multiple hypotheses in behavioural ecology, Behavioral Ecology and Sociobiology, 65, 37, 10.1007/s00265-010-1039-4

Dominicus, 2006, Likelihood ratio tests in behavioral genetics: problems and solutions, Behavior Genetics, 36, 331, 10.1007/s10519-005-9034-7

Dormann, 2013, Collinearity: a review of methods to deal with it and a simulation study evaluating their performance, Ecography, 36, 27, 10.1111/j.1600-0587.2012.07348.x

Ellison, 2004, Bayesian inference in ecology, Ecology Letters, 7, 509, 10.1111/j.1461-0248.2004.00603.x

Elston, 2001, Analysis of aggregation, a worked example: numbers of ticks on red grouse chicks, Parasitology, 122, 563, 10.1017/s0031182001007740

Fieberg, 2015, MMI: multimodel inference or models with management implications?, Journal of Wildlife Management, 79, 708, 10.1002/jwmg.894

Forstmeier, 2011, Cryptic multiple hypotheses testing in linear models: overestimated effect sizes and the winner’s curse, Behavioral Ecology and Sociobiology, 65, 47, 10.1007/s00265-010-1038-5

Freckleton, 2011, Dealing with collinearity in behavioural and ecological data: model averaging and the problems of measurement error, Behavioral Ecology and Sociobiology, 65, 91, 10.1007/s00265-010-1045-6

Galipaud, 2014, Ecologists overestimate the importance of predictor variables in model averaging: a plea for cautious interpretations, Methods in Ecology and Evolution, 5, 983, 10.1111/2041-210x.12251

Galipaud, 2017, A farewell to the sum of Akaike weights: the benefits of alternative metrics for variable importance estimations in model selection, Methods in Ecology and Evolution, 8, 1668, 10.1111/2041-210X.12835

Gelman, 2008, Scaling regression inputs by dividing by two standard deviations, Statistics in Medicine, 27, 2865, 10.1002/sim.3107

Gelman, 2007, Data Analysis Using Regression and Hierarchical/Multilevel Models, 10.32614/CRAN.package.arm

Gelman, 2006, Bayesian measures of explained variance and pooling in multilevel (hierarchical) models, Technometrics, 48, 241, 10.1198/004017005000000517

Giam, 2016, Quantifying variable importance in a multimodel inference framework, Methods in Ecology and Evolution, 7, 388, 10.1111/2041-210x.12492

Graham, 2003, Confronting multicollinearity in multiple linear regression, Ecology, 84, 2809, 10.1890/02-3114

Grueber, 2011, Multimodel inference in ecology and evolution: challenges and solutions, Journal of Evolutionary Biology, 24, 699, 10.1111/j.1420-9101.2010.02210.x

Harrison, 2014, Using observation-level random effects to model overdispersion in count data in ecology and evolution, PeerJ, 2, e616, 10.7717/peerj.616

Harrison, 2015, A comparison of observation-level random effect and Beta-Binomial models for modelling overdispersion in Binomial data in ecology & evolution, PeerJ, 3, e1114, 10.7717/peerj.1114

Halsey, 2015, The fickle P value generates irreproducible results, Nature Methods, 12, 179, 10.1038/nmeth.3288

Hegyi, 2011, Using information theory as a substitute for stepwise regression in ecology and behaviour, Behavioral Ecology and Sociobiology, 65, 69, 10.1007/s00265-010-1036-7

Hilbe, 2011, Negative Binomial Regression, 10.1017/CBO9780511973420

Houslay, 2017, Avoiding the misuse of BLUP in behavioral ecology, Behavioral Ecology, 28, 948, 10.1093/beheco/arx023

Ives, 2015, For testing the significance of regression coefficients, go ahead and log-transform count data, Methods in Ecology and Evolution, 6, 828, 10.1111/2041-210x.12386

James, 1990, Multivariate analysis in ecology and systematics: panacea or Pandora box, Annual Review of Ecology and Systematics, 21, 129, 10.1146/annurev.es.21.110190.001021

Johnson, 2014, Extension of Nakagawa & Schielzeth’s R2GLMM to random slopes models, Methods in Ecology and Evolution, 5, 944, 10.1111/2041-210x.12225

Johnson, 2004, Model selection in ecology and evolution, Trends in Ecology & Evolution, 19, 101, 10.1016/j.tree.2003.10.013

Kass, 2016, Ten simple rules for effective statistical practice, PLOS Computational Biology, 12, e1004961, 10.1371/journal.pcbi.1004961

Kéry, 2010, Introduction to WinBUGS for Ecologists: Bayesian Approach to Regression, ANOVA, Mixed Models and Related Analyses

Kuznetsova, 2014, Package ‘lmerTest’. Test for random and fixed effects for linear mixed effect models (lmer objects of lme4 package)

Lefcheck, 2015, piecewiseSEM: piecewise structural equation modeling in R for ecology, evolution, and systematics, Methods in Ecology and Evolution, 7, 573, 10.1111/2041-210x.12512

Lindberg, 2015, History of multimodel inference via model selection in wildlife science, Journal of Wildlife Management, 79, 704, 10.1002/jwmg.892

Low-Décarie, 2014, Rising complexity and falling explanatory power in ecology, Frontiers in Ecology and the Environment, 12, 412, 10.1890/130230

Lüdecke, 2017, SjPlot: data visualization for statistics in social science

Lukacs, 2010, Model selection bias and Freedman’s paradox, Annals of the Institute of Statistical Mathematics, 62, 117, 10.1007/s10463-009-0234-4

Mundry, 2011, Issues in information theory-based statistical inference—a commentary from a frequentist’s perspective, Behavioral Ecology and Sociobiology, 65, 57, 10.1007/s00265-010-1040-y

Murtaugh, 2007, Simplicity and complexity in ecological data analysis, Ecology, 88, 56, 10.1890/0012-9658(2007)88[56:sacied]2.0.co;2

Murtaugh, 2009, Performance of several variable-selection methods applied to real ecological data, Ecology Letters, 12, 1061, 10.1111/j.1461-0248.2009.01361.x

Murtaugh, 2014, In defense of P values, Ecology, 95, 611, 10.1890/13-0590.1

Nagelkerke, 1991, A note on a general definition of the coefficient of determination, Biometrika, 78, 691, 10.1093/biomet/78.3.691

Nakagawa, 2015, Missing data: mechanisms, methods and messages, Ecological Statistics: Contemporary Theory and Application, 81, 10.1093/acprof:oso/9780199672547.003.0005

Nakagawa, 2004, The case against retrospective statistical power analyses with an introduction to power analysis, Acta Ethologica, 7, 103, 10.1007/s10211-004-0095-z

Nakagawa, 2008, Missing inaction: the dangers of ignoring missing data, Trends in Ecology & Evolution, 23, 592, 10.1016/j.tree.2008.06.014

Nakagawa, 2011, Model averaging, missing data and multiple imputation: a case study for behavioural ecology, Behavioral Ecology and Sociobiology, 65, 103, 10.1007/s00265-010-1044-7

Nakagawa, 2017, The coefficient of determination R2 and intra-class correlation coefficient from generalized linear mixed-effects models revisited and expanded, Journal of the Royal Society Interface, 14, 20170213, 10.1098/rsif.2017.0213

Nakagawa, 2010, Repeatability for Gaussian and non-Gaussian data: a practical guide for biologists, Biological Reviews, 85, 935, 10.1111/j.1469-185X.2010.00141.x

Nakagawa, 2013, A general and simple method for obtaining R2 from generalized linear mixed-effects models, Methods in Ecology and Evolution, 4, 133, 10.1111/j.2041-210x.2012.00261.x

Nickerson, 2000, Null hypothesis significance testing: a review of an old and continuing controversy, Psychological Methods, 5, 241, 10.1037/1082-989x.5.2.241

Noble, 2017, Planned missing data design: stronger inferences increased research efficiency and improved animal welfare in ecology and evolution, bioRxiv, 247064, 10.1101/247064

O’Hara, 2010, Do not log-transform count data, Methods in Ecology and Evolution, 1, 118, 10.1111/j.2041-210x.2010.00021.x

Peig, 2009, New perspectives for estimating body condition from mass/length data: the scaled mass index as an alternative method, Oikos, 118, 1883, 10.1111/j.1600-0706.2009.17643.x

Peters, 1991, A Critique for Ecology

Quinn, 2002, Experimental Design and Data Analysis for Biologists, 10.1017/CBO9780511806384

R Core Team, 2016, R: A Language and Environment for Statistical Computing

Richards, 2005, Testing ecological theory using the information-theoretic approach: examples and cautionary results, Ecology, 86, 2805, 10.1890/05-0074

Richards, 2008, Dealing with overdispersed count data in applied ecology, Journal of Applied Ecology, 45, 218, 10.1111/j.1365-2664.2007.01377.x

Richards, 2011, Model selection and model averaging in behavioural ecology: the utility of the IT-AIC framework, Behavioral Ecology and Sociobiology, 65, 77, 10.1007/s00265-010-1035-8

Rousset, 2014, Testing environmental and genetic effects in the presence of spatial autocorrelation, Ecography, 37, 781, 10.1111/ecog.00566

Rykiel, 1996, Testing ecological models: the meaning of validation, Ecological Modelling, 90, 229, 10.1016/0304-3800(95)00152-2

Scheipl, 2016, RLRsim: exact (restricted) likelihood ratio tests for mixed and additive models computational statistics & data analysis

Schielzeth, 2010, Simple means to improve the interpretability of regression coefficients, Methods in Ecology and Evolution, 1, 103, 10.1111/j.2041-210x.2010.00012.x

Schielzeth, 2009, Conclusions beyond support: overconfident estimates in mixed models, Behavioral Ecology, 20, 416, 10.1093/beheco/arn145

Schielzeth, 2013, Nested by design: model fitting and interpretation in a mixed model era, Methods in Ecology Evolution, 4, 14, 10.1111/j.2041-210x.2012.00251.x

Southwood, 2000, Ecological Methods

Stephens, 2005, Information theory and hypothesis testing: a call for pluralism, Journal of Applied Ecology, 42, 4, 10.1111/j.1365-2664.2005.01002.x

Symonds, 2011, A brief guide to model selection, multimodel inference and model averaging in behavioural ecology using Akaike’s information criterion, Behavioral Ecology and Sociobiology, 65, 13, 10.1007/s00265-010-1037-6

Vaida, 2005, Conditional Akaike information for mixed-effects models, Biometrika, 92, 351, 10.1093/biomet/92.2.351

van de Pol, 2009, A simple method for distinguishing within-versus between-subject effects using mixed models, Animal Behaviour, 77, 753, 10.1016/j.anbehav.2008.11.006

Verbenke, 2000, Linear Mixed Models for Longitudinal Data

Warton, 2011, The arcsine is asinine: the analysis of proportions in ecology, Ecology, 92, 3, 10.1890/10-0340.1

Warton, 2016, Three points to consider when choosing a LM or GLM test for count data, Methods in Ecology and Evolution, 7, 882, 10.1111/2041-210x.12552

Whittingham, 2006, Why do we still use stepwise modelling in ecology and behaviour?, Journal of Animal Ecology, 75, 1182, 10.1111/j.1365-2656.2006.01141.x

Wilson, 2010, An ecologist’s guide to the animal model, Journal of Animal Ecology, 79, 13, 10.1111/j.1365-2656.2009.01639.x

Wood, 2015, Generalized additive models for large data sets, Journal of the Royal Statistical Society: Series C (Applied Statistics), 64, 139, 10.1111/rssc.12068

Zuur, 2016, A protocol for conducting and presenting results of regression-type analyses, Methods in Ecology and Evolution, 7, 636, 10.1111/2041-210x.12577

Zuur, 2010, A protocol for data exploration to avoid common statistical problems, Methods in Ecology and Evolution, 1, 3, 10.1111/j.2041-210x.2009.00001.x

Zuur, 2009, Mixed Effects Models and Extensions in Ecology with R, 10.1007/978-0-387-87458-6