Random Forest and Logistic Regression algorithms for prediction of groundwater contamination using ammonia concentration

Arabian Journal of Geosciences - Tập 15 - Trang 1-12 - 2022
Ahmed Madani1, Mohammed Hagage2, Salwa F. Elbeih2
1Department of Geology, Faculty of Science, Cairo University, Giza, Egypt
2National Authority for Remote Sensing and Space Sciences (NARSS), Cairo, Egypt

Tóm tắt

The present study aims to develop an efficient predictive model for groundwater contamination using Multivariate Logistic Regression (MLR) and Random Forest (RF) algorithms. Contamination by ammonia is recorded by many authors at Sohag Governorate, Egypt and is attributed to urban growth, agricultural, and industrial activities. Thirty-two groundwater samples representing the Quaternary aquifer are collected and analyzed for major cations (Ca, Mg, and Na), ammonia, nitrate, phosphate, and heavy metals. Lead, magnesium, iron, and zinc variables are used to test the model with ammonia which is used as an index to the groundwater contamination. Spatial distribution maps and statistical analyses show a strong correlation of ammonia with lead and magnesium variables whereas iron and zinc show less correlation. For Random Forest (RF) model, the data is divided into 70% training and 30% testing subsets. The performance of the model is evaluated using the classification reports, and the confusion matrix. Results show (1) high performance of RF model to groundwater contamination with an accuracy of 93% and (2) the MLR accuracy increased from 70 to 83% when “SOLVER” and “C” parameters are modified. The study helps to identify the contaminated zones at the study area and proved the usefulness of the machine learning models for prediction of the groundwater contamination using the ammonia concentration.

Tài liệu tham khảo