A brief introduction to mixed effects modelling and multi-model inference in ecology
Tóm tắt
The use of linear mixed effects models (LMMs) is increasingly common in the analysis of biological data. Whilst LMMs offer a flexible approach to modelling a broad range of data types, ecological data are often complex and require complex model structures, and the fitting and interpretation of such models is not always straightforward. The ability to achieve robust biological inference requires that practitioners know how and when to apply these tools. Here, we provide a general overview of current methods for the application of LMMs to biological data, and highlight the typical pitfalls that can be encountered in the statistical modelling process. We tackle several issues regarding methods of model selection, with particular reference to the use of information theory and multi-model inference in ecology. We offer practical solutions and direct the reader to key references that provide further technical detail for those seeking a deeper understanding. This overview should serve as a widely accessible code of best practice for applying LMMs to complex biological problems and model structures, and in doing so improve the robustness of conclusions drawn from studies investigating ecological and evolutionary questions.
Từ khóa
Tài liệu tham khảo
Aarts, 2015, Multilevel analysis quantifies variation in the experimental effect while optimizing power and preventing false positives, BMC Neuroscience, 16, 94, 10.1186/s12868-015-0228-5
Allegue, 2017, Statistical Quantification of Individual Differences (SQuID): an educational and statistical tool for understanding multilevel phenotypic data in linear mixed models, Methods in Ecology and Evolution, 8, 257, 10.1111/2041-210x.12659
Arnold, 2010, Uninformative parameters and model selection using Akaike’s information criterion, Journal of Wildlife Management, 74, 1175, 10.1111/j.1937-2817.2010.tb01236.x
Austin, 2002, Spatial prediction of species distribution: an interface between ecological theory and statistical modelling, Ecological Modelling, 157, 101, 10.1016/s0304-3800(02)00205-3
Barker, 2015, Truth, models, model sets, AIC, and multimodel inference: a Bayesian perspective, Journal of Wildlife Management, 79, 730, 10.1002/jwmg.890
Barr, 2013, Random effects structure for confirmatory hypothesis testing: keep it maximal, Journal of Memory and Language, 68, 255, 10.1016/j.jml.2012.11.001
Bartoń, 2016, MuMIn: multi-model inference
Bates, 2015a, Parsimonious mixed models
Bates, 2015b, Fitting linear mixed-effects models using lme4, Journal of Statistical Software, 67, 1, 10.18637/jss.v067.i01
Bolker, 2009, Generalized linear mixed models: a practical guide for ecology and evolution, Trends in Ecology & Evolution, 24, 127, 10.1016/j.tree.2008.10.008
Breslow, 1993, Approximate inference in generalized linear mixed models, Journal of the American Statistical Association, 88, 9, 10.1080/01621459.1993.10594284
Brewer, 2016, The relative performance of AIC, AICC and BIC in the presence of unobserved heterogeneity, Methods in Ecology and Evolution, 7, 679, 10.1111/2041-210x.12541
Burnham, 2002, Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach, 2
Burnham, 2004, Multimodel inference: understanding AIC and BIC in model selection, Sociological Methods & Research, 33, 261, 10.1177/0049124104268644
Burnham, 2011, AIC model selection and multimodel inference in behavioral ecology: some background, observations, and comparisons, Behavioral Ecology and Sociobiology, 65, 23, 10.1007/s00265-010-1029-6
Chatfield, 1995, Model uncertainty, data mining and statistical inference (with discussion), Journal of the Royal Statistical Society. Series A (Statistics in Society), 158, 419, 10.2307/2983440
Cox, 1989, The Analysis of Binary Data, 2
Crawley, 2013, The R Book, 2
Dochtermann, 2011, Developing multiple hypotheses in behavioural ecology, Behavioral Ecology and Sociobiology, 65, 37, 10.1007/s00265-010-1039-4
Dominicus, 2006, Likelihood ratio tests in behavioral genetics: problems and solutions, Behavior Genetics, 36, 331, 10.1007/s10519-005-9034-7
Dormann, 2013, Collinearity: a review of methods to deal with it and a simulation study evaluating their performance, Ecography, 36, 27, 10.1111/j.1600-0587.2012.07348.x
Ellison, 2004, Bayesian inference in ecology, Ecology Letters, 7, 509, 10.1111/j.1461-0248.2004.00603.x
Elston, 2001, Analysis of aggregation, a worked example: numbers of ticks on red grouse chicks, Parasitology, 122, 563, 10.1017/s0031182001007740
Fieberg, 2015, MMI: multimodel inference or models with management implications?, Journal of Wildlife Management, 79, 708, 10.1002/jwmg.894
Forstmeier, 2011, Cryptic multiple hypotheses testing in linear models: overestimated effect sizes and the winner’s curse, Behavioral Ecology and Sociobiology, 65, 47, 10.1007/s00265-010-1038-5
Freckleton, 2011, Dealing with collinearity in behavioural and ecological data: model averaging and the problems of measurement error, Behavioral Ecology and Sociobiology, 65, 91, 10.1007/s00265-010-1045-6
Galipaud, 2014, Ecologists overestimate the importance of predictor variables in model averaging: a plea for cautious interpretations, Methods in Ecology and Evolution, 5, 983, 10.1111/2041-210x.12251
Galipaud, 2017, A farewell to the sum of Akaike weights: the benefits of alternative metrics for variable importance estimations in model selection, Methods in Ecology and Evolution, 8, 1668, 10.1111/2041-210X.12835
Gelman, 2008, Scaling regression inputs by dividing by two standard deviations, Statistics in Medicine, 27, 2865, 10.1002/sim.3107
Gelman, 2007, Data Analysis Using Regression and Hierarchical/Multilevel Models, 10.32614/CRAN.package.arm
Gelman, 2006, Bayesian measures of explained variance and pooling in multilevel (hierarchical) models, Technometrics, 48, 241, 10.1198/004017005000000517
Giam, 2016, Quantifying variable importance in a multimodel inference framework, Methods in Ecology and Evolution, 7, 388, 10.1111/2041-210x.12492
Graham, 2003, Confronting multicollinearity in multiple linear regression, Ecology, 84, 2809, 10.1890/02-3114
Grueber, 2011, Multimodel inference in ecology and evolution: challenges and solutions, Journal of Evolutionary Biology, 24, 699, 10.1111/j.1420-9101.2010.02210.x
Harrison, 2014, Using observation-level random effects to model overdispersion in count data in ecology and evolution, PeerJ, 2, e616, 10.7717/peerj.616
Harrison, 2015, A comparison of observation-level random effect and Beta-Binomial models for modelling overdispersion in Binomial data in ecology & evolution, PeerJ, 3, e1114, 10.7717/peerj.1114
Halsey, 2015, The fickle P value generates irreproducible results, Nature Methods, 12, 179, 10.1038/nmeth.3288
Hegyi, 2011, Using information theory as a substitute for stepwise regression in ecology and behaviour, Behavioral Ecology and Sociobiology, 65, 69, 10.1007/s00265-010-1036-7
Houslay, 2017, Avoiding the misuse of BLUP in behavioral ecology, Behavioral Ecology, 28, 948, 10.1093/beheco/arx023
Ives, 2015, For testing the significance of regression coefficients, go ahead and log-transform count data, Methods in Ecology and Evolution, 6, 828, 10.1111/2041-210x.12386
James, 1990, Multivariate analysis in ecology and systematics: panacea or Pandora box, Annual Review of Ecology and Systematics, 21, 129, 10.1146/annurev.es.21.110190.001021
Johnson, 2014, Extension of Nakagawa & Schielzeth’s R2GLMM to random slopes models, Methods in Ecology and Evolution, 5, 944, 10.1111/2041-210x.12225
Johnson, 2004, Model selection in ecology and evolution, Trends in Ecology & Evolution, 19, 101, 10.1016/j.tree.2003.10.013
Kass, 2016, Ten simple rules for effective statistical practice, PLOS Computational Biology, 12, e1004961, 10.1371/journal.pcbi.1004961
Kéry, 2010, Introduction to WinBUGS for Ecologists: Bayesian Approach to Regression, ANOVA, Mixed Models and Related Analyses
Kuznetsova, 2014, Package ‘lmerTest’. Test for random and fixed effects for linear mixed effect models (lmer objects of lme4 package)
Lefcheck, 2015, piecewiseSEM: piecewise structural equation modeling in R for ecology, evolution, and systematics, Methods in Ecology and Evolution, 7, 573, 10.1111/2041-210x.12512
Lindberg, 2015, History of multimodel inference via model selection in wildlife science, Journal of Wildlife Management, 79, 704, 10.1002/jwmg.892
Low-Décarie, 2014, Rising complexity and falling explanatory power in ecology, Frontiers in Ecology and the Environment, 12, 412, 10.1890/130230
Lüdecke, 2017, SjPlot: data visualization for statistics in social science
Lukacs, 2010, Model selection bias and Freedman’s paradox, Annals of the Institute of Statistical Mathematics, 62, 117, 10.1007/s10463-009-0234-4
Mundry, 2011, Issues in information theory-based statistical inference—a commentary from a frequentist’s perspective, Behavioral Ecology and Sociobiology, 65, 57, 10.1007/s00265-010-1040-y
Murtaugh, 2007, Simplicity and complexity in ecological data analysis, Ecology, 88, 56, 10.1890/0012-9658(2007)88[56:sacied]2.0.co;2
Murtaugh, 2009, Performance of several variable-selection methods applied to real ecological data, Ecology Letters, 12, 1061, 10.1111/j.1461-0248.2009.01361.x
Nagelkerke, 1991, A note on a general definition of the coefficient of determination, Biometrika, 78, 691, 10.1093/biomet/78.3.691
Nakagawa, 2015, Missing data: mechanisms, methods and messages, Ecological Statistics: Contemporary Theory and Application, 81, 10.1093/acprof:oso/9780199672547.003.0005
Nakagawa, 2004, The case against retrospective statistical power analyses with an introduction to power analysis, Acta Ethologica, 7, 103, 10.1007/s10211-004-0095-z
Nakagawa, 2008, Missing inaction: the dangers of ignoring missing data, Trends in Ecology & Evolution, 23, 592, 10.1016/j.tree.2008.06.014
Nakagawa, 2011, Model averaging, missing data and multiple imputation: a case study for behavioural ecology, Behavioral Ecology and Sociobiology, 65, 103, 10.1007/s00265-010-1044-7
Nakagawa, 2017, The coefficient of determination R2 and intra-class correlation coefficient from generalized linear mixed-effects models revisited and expanded, Journal of the Royal Society Interface, 14, 20170213, 10.1098/rsif.2017.0213
Nakagawa, 2010, Repeatability for Gaussian and non-Gaussian data: a practical guide for biologists, Biological Reviews, 85, 935, 10.1111/j.1469-185X.2010.00141.x
Nakagawa, 2013, A general and simple method for obtaining R2 from generalized linear mixed-effects models, Methods in Ecology and Evolution, 4, 133, 10.1111/j.2041-210x.2012.00261.x
Nickerson, 2000, Null hypothesis significance testing: a review of an old and continuing controversy, Psychological Methods, 5, 241, 10.1037/1082-989x.5.2.241
Noble, 2017, Planned missing data design: stronger inferences increased research efficiency and improved animal welfare in ecology and evolution, bioRxiv, 247064, 10.1101/247064
O’Hara, 2010, Do not log-transform count data, Methods in Ecology and Evolution, 1, 118, 10.1111/j.2041-210x.2010.00021.x
Peig, 2009, New perspectives for estimating body condition from mass/length data: the scaled mass index as an alternative method, Oikos, 118, 1883, 10.1111/j.1600-0706.2009.17643.x
Peters, 1991, A Critique for Ecology
R Core Team, 2016, R: A Language and Environment for Statistical Computing
Richards, 2005, Testing ecological theory using the information-theoretic approach: examples and cautionary results, Ecology, 86, 2805, 10.1890/05-0074
Richards, 2008, Dealing with overdispersed count data in applied ecology, Journal of Applied Ecology, 45, 218, 10.1111/j.1365-2664.2007.01377.x
Richards, 2011, Model selection and model averaging in behavioural ecology: the utility of the IT-AIC framework, Behavioral Ecology and Sociobiology, 65, 77, 10.1007/s00265-010-1035-8
Rousset, 2014, Testing environmental and genetic effects in the presence of spatial autocorrelation, Ecography, 37, 781, 10.1111/ecog.00566
Rykiel, 1996, Testing ecological models: the meaning of validation, Ecological Modelling, 90, 229, 10.1016/0304-3800(95)00152-2
Scheipl, 2016, RLRsim: exact (restricted) likelihood ratio tests for mixed and additive models computational statistics & data analysis
Schielzeth, 2010, Simple means to improve the interpretability of regression coefficients, Methods in Ecology and Evolution, 1, 103, 10.1111/j.2041-210x.2010.00012.x
Schielzeth, 2009, Conclusions beyond support: overconfident estimates in mixed models, Behavioral Ecology, 20, 416, 10.1093/beheco/arn145
Schielzeth, 2013, Nested by design: model fitting and interpretation in a mixed model era, Methods in Ecology Evolution, 4, 14, 10.1111/j.2041-210x.2012.00251.x
Southwood, 2000, Ecological Methods
Stephens, 2005, Information theory and hypothesis testing: a call for pluralism, Journal of Applied Ecology, 42, 4, 10.1111/j.1365-2664.2005.01002.x
Symonds, 2011, A brief guide to model selection, multimodel inference and model averaging in behavioural ecology using Akaike’s information criterion, Behavioral Ecology and Sociobiology, 65, 13, 10.1007/s00265-010-1037-6
Vaida, 2005, Conditional Akaike information for mixed-effects models, Biometrika, 92, 351, 10.1093/biomet/92.2.351
van de Pol, 2009, A simple method for distinguishing within-versus between-subject effects using mixed models, Animal Behaviour, 77, 753, 10.1016/j.anbehav.2008.11.006
Verbenke, 2000, Linear Mixed Models for Longitudinal Data
Warton, 2011, The arcsine is asinine: the analysis of proportions in ecology, Ecology, 92, 3, 10.1890/10-0340.1
Warton, 2016, Three points to consider when choosing a LM or GLM test for count data, Methods in Ecology and Evolution, 7, 882, 10.1111/2041-210x.12552
Whittingham, 2006, Why do we still use stepwise modelling in ecology and behaviour?, Journal of Animal Ecology, 75, 1182, 10.1111/j.1365-2656.2006.01141.x
Wilson, 2010, An ecologist’s guide to the animal model, Journal of Animal Ecology, 79, 13, 10.1111/j.1365-2656.2009.01639.x
Wood, 2015, Generalized additive models for large data sets, Journal of the Royal Statistical Society: Series C (Applied Statistics), 64, 139, 10.1111/rssc.12068
Zuur, 2016, A protocol for conducting and presenting results of regression-type analyses, Methods in Ecology and Evolution, 7, 636, 10.1111/2041-210x.12577
Zuur, 2010, A protocol for data exploration to avoid common statistical problems, Methods in Ecology and Evolution, 1, 3, 10.1111/j.2041-210x.2009.00001.x