Evaluation of the performance of data-driven approaches for filling monthly precipitation gaps in a semi-arid climate conditions

Acta Geophysica - Tập 71 - Trang 2265-2285 - 2022
Okan Mert Katipoğlu1
1Department of Civil Engineering, Erzincan Binali Yıldırım University, Erzincan, Turkey

Tóm tắt

Missing data cause problems in meteorological, hydrological, and climate analysis. The observation data should be complete and cover long periods to make the research more accurate and reliable. Artificial intelligence techniques have attracted interest for completing incomplete meteorological data in recent years. In this study the abilities of machine learning models, artificial neural networks, the nonlinear autoregressive with exogenous input (NARX) model, support vector regression, Gaussian processes regression, boosted tree, bagged tree (BAT), and linear regression to fill in missing precipitation data were investigated. In developing the machine learning model, 70% of the dataset was used for training, 15% for testing, and 15% for validation. The Bayburt, Tercan, and Zara precipitation stations, which are closest to the Erzincan station and have the highest correlation coefficients, were used to fill the data gaps. The accuracy of the constructed models was tested using various statistical criteria, such as root-mean-square error (RMSE), mean absolute error (MAE), Nash–Sutcliffe model efficiency coefficient (NSE), and determination coefficient (R2) and graphical approaches such as scattering, box plots, violin plots, and Taylor diagrams. Based on the comparison of model results, it was concluded that the BAT model with R2: 0.79 and NSE: 0.79 and error (RMSE: 11.42, and MAE: 7.93) was the most successful in the completion of missing monthly precipitation data. The contribution of this research is assist in the choice of the best and most accurate method for estimating precipitation data in semi-arid regions like Erzincan.

Tài liệu tham khảo

Afan HA, El-Shafie A, Yaseen ZM, Hameed MM, Mohtar WHMW, Hussain A (2015) ANN based sediment prediction model utilizing different input scenarios. Water resour manag 29(4):1231–1245. https://doi.org/10.1007/s11269-014-0870-1 Aksoy H, Dahamsheh A (2018) Markov chain-incorporated and synthetic data-supported conditional artificial neural network models for forecasting monthly precipitation in arid regions. J hydrol 562:758–779. https://doi.org/10.1016/j.jhydrol.2018.05.030 Alizamir M, Kisi O, Zounemat-Kermani M (2018) Modelling long-term groundwater fluctuations by extreme learning machine using hydro-climatic data. Hydrol Sci J 63(1):63–73. https://doi.org/10.1080/02626667.2017.1410891 Arslan H, Fatih Ü, Demirci M, Taşar B, Yılmaz A (2020) Estimation of Keban dam lake level change using ANFIS and support vector machines. Osman Korkut Ata Univ J Nat Appl Sci 3(2):1–7. https://doi.org/10.47495/okufbed.748018 Bakış R, Göncü S (2015) Completion of missing data in river flow measurement: case study of Zab river basin. Anadolu Univ J Sci Technol Appl Sci Eng 16(1):63–79. https://doi.org/10.18038/btd-a.45640 Belayneh A, Adamowski J, Khalil B, Ozga-Zielinski B (2014) Long-term SPI drought forecasting in the Awash river basin in Ethiopia using wavelet neural network and wavelet support vector regression models. J Hydrol 508:418–429. https://doi.org/10.1016/j.jhydrol.2013.10.052 Bellido-Jiménez JA, Gualda JE, García-Marín AP (2021) Assessing machine learning models for gap filling daily rainfall series in a semiarid region of spain. Atmosphere 12(9):1158. https://doi.org/10.3390/atmos12091158 Barrios A, Trincado G, Garreaud R (2018) Alternative approaches for estimating missing climate data: application to monthly precipitation records in South-Central Chile. For Ecosyst 5(1):1–10. https://doi.org/10.1186/s40663-018-0147-x Barrera-Animas AY, Oyedele LO, Bilal M, Akinosho TD, Delgado JMD, Akanbi LA (2022) Rainfall prediction: a comparative analysis of modern machine learning algorithms for time-series forecasting. Mach Learn Appl 7:100204 Bishop CM (1994) Neural networks and their applications. Rev sci instrum 65(6):1803–1832. https://doi.org/10.1063/1.1144830 Breiman L (2001) Random forests. Mach Learn 45:5–32 Caldera H, Piyathisse V, Nandalal K (2016) A comparison of methods of estimating missing daily rainfall data. Eng J Inst Eng Sri Lanka 49(4):1–8. https://doi.org/10.4038/engineer.v49i4.7232 Campolo M, Andreussi P, Soldati A (1999) River flood forecasting with a neural network model. Water Resour Res 35(4):1191–1197. https://doi.org/10.1029/1998WR900086 Chiu PC, Selamat A, Krejcar O, Kuok KK, Herrera-Viedma E, Fenza G (2021) Imputation of rainfall data using the sine cosine function fitting neural network. Int J Interact Multimed Artif Intell 6(7):39–48. https://doi.org/10.9781/ijimai.2021.08.013 Coulibaly P, Evora N (2007) Comparison of neural network methods for infilling missing daily weather records. J Hydrol 341(1–2):27–41. https://doi.org/10.1016/j.jhydrol.2007.04.020 Çodur YM, Tortum A (2015) An artificial neural network model for highway accident prediction: a case study of Erzurum, Turkey. PROMET Traffic Transp 27:217–225 Dahamsheh A, Aksoy H (2014) Markov chain-incorporated artificial neural network models for forecasting monthly precipitation in arid regions. Arab J Sci Eng 39(4):2513–2524. https://doi.org/10.1007/s13369-013-0810-z Dastorani MT, Moghadamnia A, Piri J, Rico-Ramirez M (2010) Application of ANN and ANFIS models for reconstructing missing flow data. Environ Monit Assess 166(1–4):421–434. https://doi.org/10.1007/s10661-009-1012-8 Dawson CW, Wilby R (1998) An artificial neural network approach to rainfall-runoff modelling. Hydrol Sci J 43(1):47–66. https://doi.org/10.1080/02626669809492102 Elbeltagi A, Kumari N, Dharpure JK, Mokhtar A, Alsafadi K, Kumar M et al (2021) Prediction of combined terrestrial evapotranspiration index (CTEI) over large river basin based on machine learning approaches. Water 13(4):547. https://doi.org/10.3390/w13040547 Elshorbagy AA, Panu U, Simonovic S (2000) Group-based estimation of missing hydrological data: I. Approach and general methodology. Hydrol Sci J 45(6):849–866. https://doi.org/10.1080/02626660009492388 Elith J, Leathwick JR, Hastie T (2008) A working guide to boosted regression trees. J Anim Ecol 77(4):802–813. https://doi.org/10.1111/j.1365-2656.2008.01390.x Ford B (1983) An overview of hot-deck procedures: incomplete data in sample surveys, vol 2. Academic Press, New York Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378 Ghazi B, Jeihouni E, Kisi O, Pham QB, Đurin B (2022) Estimation of Tasuj aquifer response to main meteorological parameter variations under Shared Socioeconomic Pathways scenarios. Theor Appl Climatol 149(1–2):25–37. https://doi.org/10.1007/s00704-022-04025-4 Ghorbani K, Salarijazi M, Ghahreman N (2022) Development of stepwise m5 tree model to determine the influential factors on rainfall prediction and overcome the greedy problem of its algorithm Hamzah FB, Hamzah FM, Razali SM, Samad H (2021) A comparison of multiple imputation methods for recovering missing data in hydrological studies. Civ Eng J 7(9):1608–1619 Hasanpour Kashani M, Dinpashoh Y (2012) Evaluation of efficiency of different estimation methods for missing climatological data. Stoch Environ Res Risk Assess 26(1):59–71 Hintze JL, Nelson RD (1998) Violin plots: a box plot-density trace synergism. Am Stat 52(2):181–184 Hong S, Zhou Z, Lu C, Wang B, Zhao T (2015) Bearing remaining life prediction using Gaussian process regression with composite kernel functions. J Vibroeng 17(2):695–704 Ilaboya I (2019) Performance of multiple linear regression (MLR) and artificial neural network (ANN) for the prediction of monthly maximum rainfall in Benin City, Nigeria. Int J Eng Sci Appl 3(1):21–37 Ilie C, Ilie M, Melnic L, Topalu AM (2012) Estimating the Romanian economic sentiment indicator using artificial intelligence techniques. J East Eur Res Bus Econ 2012:1. https://doi.org/10.5171/2012.966864 Jing X, Luo J, Wang J, Zuo G, Wei N (2022) A Multi-imputation method to deal with hydro-meteorological missing values by integrating chain equations and random forest. Water Resour Manag 36(4):1159–1173. https://doi.org/10.1007/s11269-021-03037-5 Katipoglu OM (2021) Estimation of incomplete precipitation data using the adaptive neuro-fuzzy inference system (ANFIS) approach. Data Sci Appl 4(1):11–15 Katipoğlu OM, Acar R (2021) Estimation of missing temperature data by artificial neural network (ANN). Dicle Univ Eng Fac J Eng 12(2):431–438. https://doi.org/10.24012/dumf.852821 Katipoğlu OM (2022a) Prediction of missing temperature data using different machine learning methods. Arab J Geosci 15(1):1–11. https://doi.org/10.1007/s12517-021-09290-7 Katipoğlu OM (2022) Monthly stream flows estimation in the Karasu river of Euphrates basin with artificial neural networks approach. J Eng Sci Des 10:917–928. https://doi.org/10.21923/jesd.982868 Khorsandi Z, Mahdavi M, Salajeghe A, Eslamian S (2011) Neural network application for monthly precipitation data reconstruction. J Environ Hydrol 19:1–12 Kim JW, Pachepsky YA (2010) Reconstructing missing daily precipitation data using regression trees and artificial neural networks for SWAT streamflow simulation. J Hydrol 394(3–4):305–314. https://doi.org/10.1016/j.jhydrol.2010.09.005 Kisi O, Cimen M (2009) Evapotranspiration modelling using support vector machines/Modélisation de l’évapotranspiration à l’aide de ‘support vector machines.’ Hydrol Sci J 54(5):918–928. https://doi.org/10.1623/hysj.54.5.918 Kisi O, Cimen M (2011) A wavelet-support vector machine conjunction model for monthly streamflow forecasting. J Hydrol 399(1–2):132–140. https://doi.org/10.1016/j.jhydrol.2010.12.041 Kuligowski RJ, Barros AP (1998) Using artificial neural networks to estimate missing rainfall data 1. JAWRA J Am Water Resour Assoc 34(6):1437–1447. https://doi.org/10.1111/j.1752-1688.1998.tb05443.x Lima AR, Cannon AJ, Hsieh WW (2013) Non-linear regression in environmental sciences by support vector machines combined with evolutionary strategy. Comput Geosci 50:136–144. https://doi.org/10.1016/j.cageo.2012.06.023 Lin T, Horne BG, Tino P, Giles CL (1996) Learning long-term dependencies in NARX recurrent neural networks. IEEE Trans Neural Networks 7(6):1329–1338. https://doi.org/10.1109/72.548162 MathWorks (2022b) Statistics and machine learning toolbox™ user's guide. https://www.mathworks.com/help/pdf_doc/stats/stats.pdf Accessed 16 Oct 2022 Maier HR, Dandy GC (2000) Neural networks for the prediction and forecasting of water resources variables: a review of modelling issues and applications. Environ Model Softw 15(1):101–124. https://doi.org/10.1016/S1364-8152(99)00007-9 Michaelides SC, Tymvyos F, Kalogirou S (2007) Artificial neural networks for meteorological variables pertained to energy and renewable energy applications. In: Artificial intelligence in energy and renewable energy systems, Nova Science Publishers, pp 47–82 Müller KR, Mika S, Tsuda K, Schölkopf K (2018) An introduction to kernel-based learning algorithms. In: Handbook of neural network signal processing, pp 4–1 Moore DS, Notz WI, Flinger MA (2013) The basic practice of statistics, 6th edn. W. H. Freeman and Company, New York, NY Navidi MN, Seyedmohammadi J, Seyed Jalali SA (2021) Predicting soil water content using support vector machines improved by meta-heuristic algorithms and remotely sensed data. Geomech Geoeng. https://doi.org/10.1080/17486025.2020.1864032 Oh H, Scheuren F (1983) Weighting adjustment for unit non-response. In: Madow WG, Olkin I, Rubin DB (eds) Incomplete data in sample surveys. Academic Press, New York, pp 143–184 Papacharalampous G, Tyralis H, Koutsoyiannis D (2018) Predictability of monthly temperature and precipitation using automatic time series forecasting methods. Acta Geophys 66(4):807–831. https://doi.org/10.1007/s11600-018-0120-7 Patel AK, Chatterjee S (2016) Computer vision-based limestone rock-type classification using probabilistic neural network. Geosci Front 7(1):53–60. https://doi.org/10.1016/j.gsf.2014.10.005 Pizarro R, Ausensi P, Aravena D, Sangüesa C, León L, Balocchi F (2009) Evaluation of hydrologic methods for completing rainfall missing values. Aqua-Lac 1(2):172–184 Prakaisak I, Phaisangittisagul E, Maleewong M, Sarinnapakorn K, Chantrapornchai C (2021) Detecting anomaly and replacement prediction for rainfall open data in Thailand. In: 2021 18th international joint conference on computer science and software engineering (JCSSE), IEEE, pp 1–6. https://doi.org/10.1109/JCSSE53117.2021.9493814 Radi NFA, Zakaria R, Azman MAZ (2015) Estimation of missing rainfall data using spatial interpolation and imputation methods. In: AIP conference proceedings, Vol 1643(1), American Institute of Physics, pp 42–48. https://doi.org/10.1063/1.4907423 Ridwan WM, Sapitang M, Aziz A, Kushiar KF, Ahmed AN, El-Shafie A (2021) Rainfall forecasting model using machine learning methods: case study Terengganu, Malaysia. Ain Shams Eng J 12(2):1651–1663. https://doi.org/10.1016/j.asej.2020.09.011 Ruman S, Krpec P, Rusnok P, Black AR, Trizna M, Ball T (2020) Impact of missing precipitation values on hydrological model output: a case study from the Eddleston Water catchment. Scotl Acta Geophys 68(2):565–576. https://doi.org/10.1007/s11600-020-00409-0 Ruezzene CB, Miranda RBD, Bolleli TDM, Mauad FF (2022) Filling and validating rainfall data based on statistical techniques and artificial intelligence. Rev Ambiente Água. https://doi.org/10.4136/ambi-agua.2767 Sahoo A, Ghose DK (2022) Imputation of missing precipitation data using KNN, SOM, RF, and FNN. Soft Comput 26:1–18 Sattari MT, Rezazadeh-Joudi A, Kusiak A (2017) Assessment of different methods for estimation of missing data in precipitation studies. Hydrol Res 48(4):1032–1044. https://doi.org/10.2166/nh.2016.364 Sattari MT, Falsafian K, Irvem A, Qasem SN (2020) Potential of kernel and tree-based machine-learning models for estimating missing data of rainfall. Eng Appl Comput Fluid Mech 14(1):1078–1094. https://doi.org/10.1080/19942060.2020.1803971 Shamshirband S, Hashemi S, Salimi H, Samadianfard S, Asadi E, Shadkani S et al (2020) Predicting standardized streamflow index for hydrological drought using machine learning models. Eng Appl Comput Fluid Mech 14(1):339–350. https://doi.org/10.1080/19942060.2020.1715844 Sharma A, Khanna A, Bhargava M, Pendse R (2021) Rainfall Prediction: analysis of machine learning algorithms and ensemble techniques. In: 2021 7th ınternational conference on signal processing and communication (ICSC), IEEE, pp 234–240 Shen HY, Chang LC (2013) Online multistep-ahead inundation depth forecasts by recurrent NARX networks. Hydrol Earth Syst Sci 17(3):935–945. https://doi.org/10.5194/hess-17-935-2013 Souza GRD, Bello IP, Corrêa FV, Oliveira LFCD (2020) Artificial neural networks for filling missing streamflow data in Rio do carmo basin, minas gerais, Brazil. Braz Arch Biol Technol. https://doi.org/10.1590/1678-4324-2020180522 Szarvas G, Farkas R, Kocsor A, Kocsor A (2006) A multilingual named entity recognition system using boosting and c4. 5 decision tree learning algorithms. In: 9th int conf disc sci (DS2006), LNAI Taylor KE (2001) Summarizing multiple aspects of model performance in a single diagram. J Geophys Res 106:7183–7192 Teegavarapu RS, Chandramouli V (2005) Improved weighting methods, deterministic and stochastic data-driven models for estimation of missing precipitation records. J Hydrol 312(1–4):191–206. https://doi.org/10.1016/j.jhydrol.2005.02.015 Tencaliec P, Favre AC, Prieur C, Mathevet T (2015) Reconstruction of missing daily streamflow data using dynamic regression models. Water Resour Res 51(12):9447–9463. https://doi.org/10.1002/2015WR017399 Vapnik VN (1998) An overview of statistical learning theory. IEEE Trans Neural Netw 10(5):988–999 Vapnik VN, Mukherjee S (2000) Support vector method for multivariate density estimation. Adv Neural Inf Process Syst 12(12):659–665 Vapnik VN (2013) The nature of statistical learning theory. Springer science & business media, Berlin Wang L (2005) Support vector machines: theory and applications, vol 177. Springer Science & Business Media, Berlin Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques. Morgan Kaufman, Burlington Yaseen ZM, El-Shafie A, Jaafar O, Afan HA, Sayl KN (2015) Artificial intelligence based models for streamflow forecasting: 2000–2015. J Hydrol 530:829–844. https://doi.org/10.1016/j.jhydrol.2015.10.038 Yang X, Wang Y, Byrne R, Schneider G, Yang S (2019) Concepts of artificial intelligence for computer-assisted drug discovery. Chem Rev 119(18):10520–10594. https://doi.org/10.1021/acs.chemrev.8b00728 Zhang N, Leatham K (2018) Neurodynamics-based nonnegative matrix factorization for classification. International conference on neural information processing. Springer, Cham, pp 519–529 Zikmund WG (2000) Business research methods, 6th edn. Harcourt College Publishers, Fort Worth