Water quality predictions through linear regression - A brute force algorithm approach

MethodsX - Tập 10 - Trang 102153 - 2023
A.C. P Fernandes1, A. R Fonseca2, F.A.L. Pacheco3, L.F. Sanches Fernandes2
1Centre for Natural Resources and Environment (CERENA/FEUP), Engineering Faculty, University of Porto, Rua Dr. Roberto Frias, Porto 4200-465, Portugal
2Centro de Investigação e Tecnologias Agroambientais e Biológicas, Universidade de Trás-os-Montes e Alto Douro, Ap 1013, Vila Real 5001-801, Portugal
3Centro de Química de Vila Real, Universidade de Trás-os-Montes e Alto Douro, Ap 1013, Vila Real 5001-801, Portugal

Tài liệu tham khảo

Cho, 2020, Data assimilation in surface water quality modeling: a review, Water Res., 186, 10.1016/j.watres.2020.116307 Uddin, 2021, A review of water quality index models and their use for assessing surface water quality, Ecol. Indic., 122, 10.1016/j.ecolind.2020.107218 Thakur, 1991, Model: Mechanistic vs Empirical, 41 Loucks, 2017, Water quality modeling and prediction, 417 Wool, 2020, WASP 8: The next generation in the 50-year evolution of USEPA’s water quality model, Water (Switzerland), 12 Fonseca, 2019, Predicting hydrologic flows under climate change: the tâmega basin as an analog for the mediterranean region, Sci. Total Environ., 668, 1013, 10.1016/j.scitotenv.2019.01.435 da S. Burigato Costa, 2019, Applicability of water quality models around the world – a review, Environ. Sci. Pollut. Res., 26 Pearl, 2009, Causal inference in statistics: an overview, Stat. Surv., 3, 10.1214/09-SS057 Avila, 2018, Evaluating statistical model performance in water quality prediction, J. Environ. Manage., 206, 910, 10.1016/j.jenvman.2017.11.049 Mitchell, 2019, Selecting the correct predictive modeling technique, Towar. Data Sci. Sagan, 2020, Monitoring inland water quality using remote sensing: potential and limitations of spectral indices, bio-optical simulations, machine learning, and cloud computing, Earth Sci. Rev., 205, 10.1016/j.earscirev.2020.103187 Huang, 2021, Prediction of loquat soluble solids and titratable acid content using fruit mineral elements by artificial neural network and multiple linear regression, Sci. Hortic. (Amsterdam)., 278, 10.1016/j.scienta.2020.109873 Ramasamy, 2022, A case study of flood frequency analysis by intercomparison of graphical linear log-regression method and Gumbel's analytical method in the Vaigai river basin of Tamil Nadu, India, Chemosphere, 286, 10.1016/j.chemosphere.2021.131571 Correndo, 2021, Revisiting linear regression to test agreement in continuous predicted-observed datasets, Agric. Syst., 192, 10.1016/j.agsy.2021.103194 Maaouane, 2021, Modelling industry energy demand using multiple linear regression analysis based on consumed quantity of goods, Energy, 225, 10.1016/j.energy.2021.120270 Loftus, 2022, Simple linear regression, 227 Allen, 1939, The assumptions of linear regression, Economica, 6, 10.2307/2548931 Esri, Exploratory Regression, ArcGIS Desktop. (2018). https://desktop.arcgis.com/en/arcmap/10.3/tools/spatial-statistics-toolbox/exploratory-regression.htm (accessed August 12, 2021). Braun, 2011, Exploratory regression analysis: a tool for selecting models and determining predictor importance, Behav. Res. Methods, 43, 10.3758/s13428-010-0046-8 Jones, 2011, JMP statistical discovery software, Wiley Interdiscip. Rev. Comput. Stat., 3, 10.1002/wics.162 A. Kassambara, Linear Regression Assumptions and Diagnostics in R: Essentials, Articles - Regression Model Diagnostics. (2018). http://www.sthda.com/english/articles/39-regression-model-diagnostics/161-linear-regression-assumptions-and-diagnostics-in-r-essentials/ (accessed August 27, 2022). Wang, 2018, Stepwise regression and all possible subsets regression in education, Electron. Int. J. Educ. Arts Sci., 2, 60 Rose, 2019, Limitations of p-values and r-squared for stepwise regression building: a fairness demonstration in health policy risk adjustment, Am. Stat., 73, 152, 10.1080/00031305.2018.1518269 Smith, 2018, Step away from stepwise, J. Big Data, 5, 10.1186/s40537-018-0143-6 Riyad, 2020, Comparative evaluation of numerous optimization algorithms for compiling travel salesman problem, J. Adv. Res. Dyn. Control Syst., 12 Rashid, 2022, Heart disease diagnosis using the brute force algorithm and machine learning techniques, Comput. Mater. Contin., 72, 3195 Brunsdon, 1996, Geographically weighted regression: a method for exploring spatial nonstationarity, Geogr. Anal., 28, 281, 10.1111/j.1538-4632.1996.tb00936.x Sheehan, 2013, Advantages of geographically weighted regression for modeling benthic substrate in two greater yellowstone ecosystem streams, Environ. Model. Assess., 18, 10.1007/s10666-012-9334-2 M. Anwar, Geographic Weighted Regression on 911 phone calls, YouTube. (2012). https://www.youtube.com/watch?v=plfCMZhROeQ&t=2510s&ab_channel=MoulayAnwarSounny-Slitine (accessed August 11, 2021). Koh, 2020, Application of geographically weighted regression models to predict spatial characteristics of nitrate contamination: implications for an effective groundwater management strategy, J. Environ. Manage., 268, 10.1016/j.jenvman.2020.110646 Zhu, 2020, Impacts of urbanization and landscape pattern on habitat quality using OLS and GWR models in Hangzhou, China, Ecol. Indic., 117, 10.1016/j.ecolind.2020.106654 Kashki, 2021, Evaluation of the effect of geographical parameters on the formation of the land surface temperature by applying OLS and GWR, a case study Shiraz City, Iran, Urban Clim., 37, 10.1016/j.uclim.2021.100832 Sousa, 2019, Monitoring of the 17 EU watch list contaminants of emerging concern in the ave and the sousa rivers, Sci. Total Environ., 10.1016/j.scitotenv.2018.08.309 Fonseca, 2018, Integrating water quality responses to best management practices in Portugal, Environ. Sci. Pollut. Res., 10.1007/s11356-017-0610-1 Fernandes, 2019, A structural equation model to predict macroinvertebrate-based ecological status in catchments influenced by anthropogenic pressures, Sci. Total Environ., 681, 242, 10.1016/j.scitotenv.2019.05.117 Permai, 2021, Fiscal decentralization analysis that affect economic performance using geographically weighted regression (GWR), Proced. Comput. Sci., 179, 399, 10.1016/j.procs.2021.01.022 Robbert Legg, 2009 SNIRH, Sistema Nacional de Informação de Recursos Hídricos, (1997). https://snirh.apambiente.pt/ (accessed January 10, 2021). EEA, Data and maps — European environment agency, (2021). https://www.eea.europa.eu/data-and-maps (accessed December 12, 2018). 2012, ArcMap 10.1, Environ. Syst. Resour. Inst. ESRI, ArcHydro tools for ArcGIS 10 – Tutorial, (2012). DGT, Direcção geral do território, Carta de Uso e Ocupação do Solo. (2018). http://www.dgterritorio.pt/ (accessed April 12, 2020). Adamczyk, 2017, ZonalMetrics - a python toolbox for zonal landscape structure analysis, Comput. Geosci., 99, 91, 10.1016/j.cageo.2016.11.005 INE, Statistics Portugal- Census 2011, (2014). https://censos.ine.pt/ (accessed January 3, 2021). SNIAMB, Sistema Nacional de Informação de Ambiente, (2016). https://sniamb.apambiente.pt/ (accessed December 2, 2020). Magdalinos, 2021, Least squares and ivx limit theory in systems of predictive regressions with garch innovations, Econom. Theory Stanton, 2001, Galton, pearson, and the peas: a brief history of linear regression for statistics instructors, J. Stat. Educ., 9, 10.1080/10691898.2001.11910537 Gang Su, 2009 Venkatesh Babu, 2020, Comparison of linear regression and simple linear regression for critical temperature of semiconductor, Indian J. Comput. Sci. Eng., 10, 177 Islam, 2021, Allometric equations for estimating stem biomass of Artocarpus chaplasha Roxb. in Sylhet Hill forest of Bangladesh, Trees For. People, 4 Park, 2020, Linear regression, 220 Pyrczak, 2019, Coefficient of determination, Mak. Sense Stat. Yin, 2001, Estimating R2 shrinkage in multiple regression: a comparison of different analytical methods, J. Exp. Educ., 69, 203, 10.1080/00220970109600656 Miles, 2014, Adjusted R squared Steinberger, 2016, The relative effects of dimensionality and multiplicity of hypotheses on the f-test in linear regression, Electron. J. Stat., 10, 10.1214/16-EJS1186 Maneejuk, 2021, Significance test for linear regression: how to test without p-values?, J. Appl. Stat., 48, 10.1080/02664763.2020.1748180 Derryberry, 2018, Model selection and regression t-statistics, Am. Stat., 72, 10.1080/00031305.2018.1459316 Marques, 2018 Casson, 2014, Understanding and checking the assumptions of linear regression: a primer for medical researchers, Clin. Exp. Ophthalmol., 42 Katrutsa, 2017, Comprehensive study of feature selection methods to solve multicollinearity problem according to evaluation criteria, Exp. Syst. Appl., 76, 1, 10.1016/j.eswa.2017.01.048 Ullah, 2019, Some new diagnostics of multicollinearity in linear regression model, Sains. Malays., 48, 10.17576/jsm-2019-4809-26 Kutner, 2004 Javari, 2017, Spatial monitoring and variability of daily rainfall in Iran, Int. J. Appl. Environ. Sci., 12 Hair, 2014, A primer on partial least squares structural equation modeling, Sage Publ. Inc. Malyarets, 2018, The heteroskedasticity tests implementation for linear regression model using matlab, Inform, 42 Baum, 2019, Advice on using heteroskedasticity-based identification, Stata J., 19, 10.1177/1536867X19893614 Wu, 2020, Is normal distribution necessary in regression? how to track and fix it?, Towar. Data Sci. Delgado, 2000, A nonparametric test for serial independence of regression errors, Biometrika, 87, 10.1093/biomet/87.1.228 Mukherjee, 2019, Problem of autocorrelation in linear regression detection and remedies, Int. j. multidiscip. res. mod. educ., 5, 105 Zhao, 2021, Differentially private autocorrelation time-series data publishing based on sliding window, Secur. Commun. Netw. Getis, 2007, Reflections on spatial autocorrelation, Reg. Sci. Urban Econ., 37, 10.1016/j.regsciurbeco.2007.04.005 Griffith, 2016, Spatial autocorrelation and uncertainty associated with remotely-sensed data, Remote Sens., 8, 10.3390/rs8070535 Li, 2007, Beyond Moran's I: Testing for spatial dependence based on the spatial autoregressive model, Geogr. Anal., 39, 10.1111/j.1538-4632.2007.00708.x Jarque, 1980, Efficient tests for normality, homoscedasticity and serial independence of regression residuals, Econ. Lett., 6, 255, 10.1016/0165-1765(80)90024-5 Anderson, 1954, A test of goodness of fit, J. Am. Stat. Assoc., 49, 10.1080/01621459.1954.10501232 Shapiro, 1965, An analysis of variance test for normality (Complete Samples), Biometrika, 52, 10.1093/biomet/52.3-4.591 Kolmogorov, 1933, Sulla determinazione empirica di una legge di distribuzione, Giorn. Inst. Ital. Attuari., 4, 83 D'Agostino, 1971, An omnibus test of normality for moderate and large size samples, Biometrika, 58, 10.1093/biomet/58.2.341 Breusch, 1979, A simple test for heteroscedasticity and random coefficient variation, Econometrica, 47, 1287, 10.2307/1911963 Harvey, 1977, Testing for functional misspecification in regression analysis, J. Econom., 6, 103, 10.1016/0304-4076(77)90057-4 Glejser, 1969, A new test for heteroskedasticity, J. Am. Stat. Assoc., 64, 10.1080/01621459.1969.10500976 Goldfeld, 1965, Some tests for homoscedasticity, J. Am. Stat. Assoc., 60, 10.1080/01621459.1965.10480811 de Salis, 2019, Hydrologic modeling for sustainable water resources management in urbanized karst areas, Int. J. Environ. Res. Public Health, 16 Montaño Moreno, 2013, Using the R-MAPE index as a resistant measure of forecast accuracy, Psicothema, 25 Davarpanah, 2018, Spatial autocorrelation of neogene-quaternary lava along the Snake River Plain, Idaho, USA, Earth Sci. Inf., 11, 10.1007/s12145-017-0315-5 Team, 2021, Top 8 most in-demand programming languages for 2021, Medium Feldman, 2019, Chart: the most popular programming languages, Statista Malloy, 2019, An empirical analysis of the transition from python 2 to python 3, Empir. Softw. Eng., 24, 10.1007/s10664-018-9637-2 Cattaneo, 2018, Inference in linear regression models with many covariates and heteroscedasticity, J. Am. Stat. Assoc., 113, 1350, 10.1080/01621459.2017.1328360 Rosopa, 2013, Managing heteroscedasticity in general linear models, Psychol. Methods, 18, 335, 10.1037/a0032553 Thadewald, 2007, Jarque-bera test and its competitors for testing normality - a power comparison, J. Appl. Stat., 34, 10.1080/02664760600994539 Fitrianto, 2016, Assessing normality for data with different sample sizes using SAS, minitab and R, ARPN J. Eng. Appl. Sci., 11, 10845 2018, How spatial autocorrelation (Global Moran's I) works, ArcGIS Deskt. Oxoli, 2017, Enabling spatial autocorrelation mapping in QGIS: the hotspot analysis plugin, Geoing. Ambient. Miner., 151, 45 Kang, 2020, PySAL and spatial statistics libraries, Geogr. Inf. Sci. Technol. Body Knowl. Alexeev, 2021, Quantum computer systems for scientific discovery, PRX Quant., 2 Sethi, 2020, Comparison of 10 programming languages, Medium