A Redefined Variance Inflation Factor: Overcoming the Limitations of the Variance Inflation Factor

Román Salmerón-Gómez1, Catalina B. García-García1, José García-Pérez2
1Universidad de Granada, Granada, Spain
2Department of Economic and Business, University of Almería, Almería, Spain

Tóm tắt

The variance inflation factor is one the most applied tools for diagnosing the possible existence of multicollinearity in a multiple linear regression model. However, the VIF can detect only the relationships between independent variables without considering the intercept and is not appropriate to use with binary variables. In addition, the orthogonal model from which is calculated is also controversial. All these limitations are not usually considered when the VIF is calculated which may lead to misleading conclusions. This paper parts from an alternative orthogonal model to present a redefined variance inflation factor (RVIF) which overcomes the above limitations. This method was implemented in the rvif R package (Salmerón and García in rvif: collinearity detection using redefined variance inflation factor and graphical methods [Computer software manual]. https://cran.r-project.org/package=rvif , 2022). A Monte Carlo simulation is performed to provide threshold for this new measure. The contribution of this paper is illustrated with different examples. It is also compared with the vif command from the car R package to calculate the VIF, concluding that it could be recommendable to warn non-statisticians of its controversial use.

Từ khóa


Tài liệu tham khảo

Belsley, D. (1991). A guide to using the collinearity diagnostics. Computational Science in Economics and Manegement, 4, 33–50. Belsley, D., Kuh, E., & Welsch, R. (1980). Regression diagnostics: Identifying influential data and sources of collinearity. John Wiley & Sons. Belsley, D. A. (1984). Demeaning conditioning diagnostics through centering. The American Statistician, 38(2), 73–77. Cobb, C., & Douglas, P. (1928). A theory of production. American Economic Review, 18, 139–165. Cobb, C., & Douglas, P. (1948). Are there laws of production? The American Economic Review, 38, 1–41. Cook, R. (1984). Comment: Demeaning conditioning diagnostics through centering. The American Statistician, 38(2), 78–79. Douglas, P. (1934). A theory of wages [Computer software manual]. The Macmillan Co. Fox, J., & Weisberg, S. (2019). An R companion to applied regression, 3rd edn. https://socialsciences.mcmaster.ca/jfox/Books/Companion/ García, C. B., Salmerón, R., García, C., & García, J. (2019). Residualization: Justification, properties and application. Journal of Applied Statistics.https://doi.org/10.1080/02664763.2019.1701638 Gibbons, D. (1981). A simulation study of some ridge estimators. Journal of American Statistical Association, 76, 131–139. Hendrickx, J. (2012). perturb: Tools for evaluating collinearity [Computer software manual]. https://CRAN.R-project.org/package=perturb. R package version 2.05 Johnston, J. (1972). Econometric methods. McGraw-Hill. Kibria, B. (2003). Performance of some new ridge regression estimators. Communications in Statistics-Simulation and Computation, 32(2), 419–435. Marquardt, D. W., & Snee, R. D. (1975). Ridge regression in practice. The American Statistician, 29(1), 3–20. McDonald, G., & Galarneau, D. (1975). A Monte Carlo evaluation of some ridge type estimators. Journal of American Statistical Association, 70, 407–416. Novales, A. (1993). Econometría. McGraw-Hill. O’Brien, R. (2007). A caution regarding rules of thumb for variance inflation factors. Quality & Quantity, 41, 673–690. Olva Maldonado, H. (2009). Análisis de la función de producción cobb-douglas y su aplicación en el sector productivo mexicano [Computer software manual]. Tesis profesional, Universidad Autónoma de Chapingo R Core Team. (2022). R: A language and environment for statistical computing [Computer software manual]. https://www.R-project.org/ Rodríguez Sánchez, A., Salmerón Gómez, R., & García García, C. (2021). Obtaining a threshold for the Stewart index and its extension to ridge regression. Computational Statistics, 36, 1011–1029. Rodríguez Sánchez, A., Salmerón Gómez, R., & García García, C. (2022). The coefficient of determination in the ridge regression. Communications in Statistics-Simulation and Computation, 51(1), 201–219. Salmerón, R. , & García, C. (2022). rvif: Collinearity detection using redefined variance inflation factor and graphical methods [Computer software manual]. https://cran.r-project.org/package=rvif Salmerón, R., García, C., & García, J. (2018). Variance inflation factor and condition number in multiple linear regression. Journal of Statistical Computation and Simulation, 88, 2365–2384. Salmerón, R. , García, C. , & García, J. (2019). multicoll: Collinearity detection in a multiple linear regression model [Computer software manual]. https://CRAN.R-project.org/package=multiColl. R package version 1.0 Salmerón, R., García, C., & García, J. (2021). A guide to using the r package multicoll for detecting multicollinearity. Computational Economics, 57, 529–536. https://doi.org/10.1007/s10614-019-09967-y Salmerón, R., García, C., & García, J. (2021). The multicoll package versus other existing packages in r to detect multicollinearity. Computational Economics. https://doi.org/10.1007/s10614-021-10154-1 Salmerón, R., García, J., García, C., & López, M. (2018). Transformation of variables and the condition number in ridge estimation. Computational Statistics, 33, 1497–1524. Salmerón Gómez, R., García García, C., & García Pérez, J. (2020). Comment on a “note on collinearity diagnostics and centering” by Velilla (2018). The American Statistician, 74(1), 68–71. Salmerón Gómez, R., García García, C., & García Pérez, J. (2020). Detection of near-multicollinearity through centered and noncentered regression. Mathematics, 8(6), 931. Salmerón Gómez, R., García Pérez, J., Martín, López., & M.D.M., & García García, C. (2016). Collinearity diagnostic in ridge estimation through the variance inflation factor. Journal of Applied Statistics, 43(10), 1831–1849. Salmerón-Gómez, R., Rodríguez-Sánchez, A., & García-García, C. (2019). Diagnosis and quantification of the non-essential collinearity. Computational Statistics, 35, 647–666. Salmerón Gómez, R., Rodríguez Sánchez, A., García García, C., & García Pérez, J. (2020). The VIF and MSE in raise regression. Mathematics, 8(4), 605. Stewart, G. W. (1987). Collinearity and least squares regression. Statistical Science, 2, 68–84. Theil, H. (1971). Principles of econometrics [Computer software manual]. John Wiley and Sons. Vanhove, J. (2020). Collinearity isn’t a disease that needs curing. Wichern, D., & Churchill, G. (1978). A comparison of ridge estimators. Technometrics, 20, 301–311.