Optimization of negative sample selection for landslide susceptibility mapping based on machine learning using K-means-KNN algorithm

Springer Science and Business Media LLC - Tập 16 - Trang 4131-4152 - 2023
Chao Liu1
1State Key Laboratory of Geohazard Prevention and Geoenvironment Protection, Chengdu University of Technology, Chengdu, China

Tóm tắt

The quality of the sample plays a vital role in developing accurate models using machine learning. This aspect is equally important when evaluating regional landslide susceptibility using machine learning. Previous studies have mostly employed random generation methods to select samples, which often fail to select representative samples. Therefore, this study proposes the KK-sampling method, which uses K-means and KNN algorithms to analyze relevant attributes of the study area and select samples. To evaluate the effectiveness of the proposed method, this study employed MLP, RF, and XGBoost models in conjunction with the KK-sampling method, with Zhong County, Chongqing serving as a case study. The results indicate that the KK-sampling method significantly improves the stability and accuracy of the model. Additionally, this study analyzed the importance of landslide factors in Zhong County using SHAP values. The findings provide a reference for establishing a reasonable and effective landslide susceptibility model in the region.

Tài liệu tham khảo

Abu El-Magd SA, Ali SA, Pham QB (2021) Spatial modeling and susceptibility zonation of landslides using random forest, naïve bayes and K-nearest neighbor in a complicated terrain. Earth Sci Inform 14:1227–1243. https://doi.org/10.1007/s12145-021-00653-y Ada M, San BT (2018) Comparison of machine-learning techniques for landslide susceptibility mapping using two-level random sampling (2LRS) in Alakir catchment area, Antalya, Turkey. Nat Hazards 90:237–263. https://doi.org/10.1007/s11069-017-3043-8 Adnan MSG, Rahman S, Ahmed N, Ahmed B, Rabbi M, Rahman M (2020) Improving Spatial Agreement in Machine Learning-Based Landslide Susceptibility Mapping. Remote Sens (basel) 12:3347. https://doi.org/10.3390/rs12203347 Agatonovic-Kustrin S, Beresford R (2000) Basic concepts of artificial neural network (ANN) modeling and its application in pharmaceutical research. J Pharm Biomed Anal 22:717–727 Akinci H, Zeybek M (2021) Comparing classical statistic and machine learning models in landslide susceptibility mapping in Ardanuc (Artvin), Turkey. Nat Hazards 108:1515–1543. https://doi.org/10.1007/s11069-021-04743-4 Aktas H, San BT (2019) Landslide susceptibility mapping using an automatic sampling algorithm based on two level random sampling. Comput Geosci 133:104329. https://doi.org/10.1016/j.cageo.2019.104329 Ba Q, Chen Y, Deng S, Yang J, Li H (2018) A comparison of slope units and grid cells as mapping units for landslide susceptibility assessment. Earth Sci Inform 11:373–388 Basu T, Pal S (2020) A GIS-based factor clustering and landslide susceptibility analysis using AHP for Gish River Basin, India. Environ Dev Sustain 22:4787–4819. https://doi.org/10.1007/s10668-019-00406-4 Bishop CM (1995) Neural networks for pattern recognition. https://doi.org/10.1093/oso/9780198538493.002.0004 Breiman L (2001) Random forests. Mach Learn 45:5–32 Budimir MEA, Atkinson PM, Lewis HG (2015) A systematic review of landslide probability mapping using logistic regression. Landslides 12:419–436. https://doi.org/10.1007/s10346-014-0550-5 Bui DT, Tsangaratos P, Nguyen V-T, Liem NV, Trinh PT (2020) Comparing the prediction performance of a Deep Learning Neural Network model with conventional machine learning models in landslide susceptibility assessment. Catena (amst) 188:104426. https://doi.org/10.1016/j.catena.2019.104426 Chen T, Niu R, Jia X (2016) A comparison of information value and logistic regression models in landslide susceptibility mapping by using GIS. Environ Earth Sci 75:1–16 Chen T, Zhu L, Niu R, Trinder CJ, Peng L, Lei T (2020a) Mapping landslide susceptibility at the Three Gorges Reservoir, China, using gradient boosting decision tree, random forest and information value models. J Mt Sci 17:670–685. https://doi.org/10.1007/s11629-019-5839-3 Chen T, Guestrin C (2016) XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Deng H, Wu X, Zhang W, Liu Y, Li W, Li X, Zhou P, Zhuo W (2022) Slope-Unit Scale Landslide Susceptibility Mapping Based on the Random Forest Model in Deep Valley Areas. Remote Sens (basel) 14:4245 Dou J, Yunus AP, Tien Bui D, Merghadi A, Sahana M, Zhu Z, Chen C-W, Khosravi K, Yang Y, Pham BT (2019) Assessment of advanced random forest and decision tree algorithms for modeling rainfall-induced landslide susceptibility in the Izu-Oshima Volcanic Island, Japan. Sci Total Environ 662:332–346. https://doi.org/10.1016/j.scitotenv.2019.01.221 Du G, Zhang Y, Iqbal J, Yang Z, Yao X (2017) Landslide susceptibility mapping using an integrated model of information value method and logistic regression in the Bailongjiang watershed, Gansu Province, China. J Mt Sci 14:249–268. https://doi.org/10.1007/s11629-016-4126-9 Gariano SL, Guzzetti F (2016) Landslides in a changing climate. Earth Sci Rev 162:227–252. https://doi.org/10.1016/j.earscirev.2016.08.011 Géron A (2017) Hands-on machine learning with scikit-learn and tensorflow: Concepts. Tools, and Techniques to build intelligent systems Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT press. http://www.deeplearningbook.org Goyes-Peñafiel P, Hernandez-Rojas A (2021) Landslide susceptibility index based on the integration of logistic regression and weights of evidence: A case study in Popayan. Colombia Eng Geol 280:105958. https://doi.org/10.1016/j.enggeo.2020.105958 Grozavu A, Margarint MC, Patriche C (2012) Landslide susceptibility assessment in the Brăieşti-Sineşti sector of Iaşi Cuesta. Carpathian Journal of Earth and Environmental Sciences 7:39–46 GudiyangadaNachappa T, Kienberger S, Meena SR, Hölbling D, Blaschke T (2020) Comparison and validation of per-pixel and object-based approaches for landslide susceptibility mapping. Geomat Nat Haz Risk 11:572–600 Han H, Shi B, Zhang L (2021) Prediction of landslide sharp increase displacement by SVM with considering hysteresis of groundwater change. Eng Geol 280:105876. https://doi.org/10.1016/j.enggeo.2020.105876 Harmouzi H, Schlögel R, Jurchescu M, Havenith H-B (2021) Landslide susceptibility mapping in the vrancea-buzău seismic region, southeast Romania. Geosciences (Basel) 11:495 Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning. Springer, New York. https://doi.org/10.1007/978-0-387-84858-7 He Y, Zhao Z, Yang W, Yan H, Wang W, Yao S, Zhang L, Liu T (2021) A unified network of information considering superimposed landslide factors sequence and pixel spatial neighbourhood for landslide susceptibility mapping. Int J Appl Earth Obs Geoinf 104:102508 Huang Y, Zhao L (2018) Review on landslide susceptibility mapping using support vector machines. Catena (Amst) 165:520–529. https://doi.org/10.1016/j.catena.2018.03.003 Huang F, Yin K, Huang J, Gui L, Wang P (2017) Landslide susceptibility mapping based on self-organizing-map network and extreme learning machine. Eng Geol 223:11–22. https://doi.org/10.1016/j.enggeo.2017.04.013 Huang F, Tao S, Chang Z, Huang J, Fan X, Jiang S-H, Li W (2021) Efficient and automatic extraction of slope units based on multi-scale segmentation method for landslide assessments. Landslides 18:3715–3731 Jacobs L, Dewitte O, Poesen J, Sekajugo J, Nobile A, Rossi M, Thiery W, Kervyn M (2018) Field-based landslide susceptibility assessment in a data-scarce environment: the populated areas of the Rwenzori Mountains. Nat Hazard 18:105–124 Jacobs L, Kervyn M, Reichenbach P, Rossi M, Marchesini I, Alvioli M, Dewitte O (2020) Regional susceptibility assessments with heterogeneous landslide information: Slope unit-vs. pixel-based approach. Geomorphology 356:107084 Jain AK, Mao J, Mohiuddin KM (1996) Artificial neural networks: A tutorial. Computer (Long Beach Calif) 29:31–44 Kavzoglu T, Colkesen I, Sahin EK (2019) Machine learning techniques in landslide susceptibility mapping: a survey and a case study. In: Pradhan SP, Vishal V, Singh TN (eds) Landslides: theory, practice and modelling. Springer International Publishing, Cham, pp. 283–301. https://doi.org/10.1007/978-3-319-77377-3_13 Kavzoglu T, Teke A (2022) Advanced hyperparameter optimization for improved spatial prediction of shallow landslides using extreme gradient boosting (XGBoost). Bull Eng Geol Env 81:201. https://doi.org/10.1007/s10064-022-02708-w Keller JM, Gray MR, Givens JA (1985) A fuzzy K-nearest neighbor algorithm. IEEE Trans Syst Man Cybern SMC-15:580–585. https://doi.org/10.1109/TSMC.1985.6313426 Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th international joint conference on artificial intellige, vol 2. Montreal, Canada, pp 1137–1145 Krkač M, BernatGazibara S, Arbanas Ž, Sečanj M, MihalićArbanas S (2020) A comparative study of random forests and multiple linear regression in the prediction of landslide velocity. Landslides 17:2515–2531. https://doi.org/10.1007/s10346-020-01476-6 Lee S, Lee M-J, Jung H-S, Lee S (2020) Landslide susceptibility mapping using Naïve Bayes and Bayesian network models in Umyeonsan, Korea. Geocarto Int 35:1665–1679. https://doi.org/10.1080/10106049.2019.1585482 Li CY, Wang XC, He CZ, Wu X, Kong ZY, Li XL (2017) China National Digital Geological Map (Public Version at 1: 200 000 Scale) Spatial Database (V1), Development and Research Center of China Geological Survey; China Geological Survey (producer), 1957, National Geological Archives of China (distributor). NGA120157. K 1 Lima P, Steger S, Glade T (2021) Counteracting flawed landslide data in statistically based landslide susceptibility modelling for very large areas: a national-scale assessment for Austria. Landslides 18:3531–3546 Liu Z, Gilbert G, Cepeda JM, Lysdahl AOK, Piciullo L, Hefre H, Lacasse S (2021) Modelling of shallow landslides with machine learning algorithms. Geosci Front 12:385–393. https://doi.org/10.1016/j.gsf.2020.04.014 Liu R, Yang X, Xu C, Wei L, Zeng X (2022a) Comparative study of convolutional neural network and conventional machine learning methods for landslide susceptibility mapping. Remote Sens (Basel) 14:321 Liu S, Zhu J, Yang D, Ma B (2022b) Comparative Study of Geological Hazard Evaluation Systems Using Grid Units and Slope Units under Different Rainfall Conditions. Sustainability 14:16153. https://doi.org/10.3390/su142316153 Lombardo L, Mai PM (2018) Presenting logistic regression-based landslide susceptibility results. Eng Geol 244:14–24 Lucchese LV, de Oliveira GG, Pedrollo OC (2021) Investigation of the influence of nonoccurrence sampling on landslide susceptibility assessment using Artificial Neural Networks. Catena (Amst) 198:105067. https://doi.org/10.1016/j.catena.2020.105067 MacQueen J (1967) Some methods for classification and analysis of multivariate observations, In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. Oakland, CA, USA, pp. 281–297 Małka A (2021) Landslide susceptibility mapping of Gdynia using geographic information system-based statistical models. Nat Hazards 107:639–674. https://doi.org/10.1007/s11069-021-04599-8 Marcot BG, Hanea AM (2021) What is an optimal value of k in k-fold cross-validation in discrete Bayesian network analysis? Comput Stat 36:2009–2031 Marjanović M, Kovačević M, Bajat B, Voženílek V (2011) Landslide susceptibility assessment using SVM machine learning algorithm. Eng Geol 123:225–234. https://doi.org/10.1016/j.enggeo.2011.09.006 Meena SR, Puliero S, Bhuyan K, Floris M, Catani F (2022) Assessing the importance of conditioning factor selection in landslide susceptibility for the province of Belluno (region of Veneto, northeastern Italy). Nat Hazard 22:1395–1417. https://doi.org/10.5194/nhess-22-1395-2022 Metsalu T, Vilo J (2015) ClustVis: a web tool for visualizing clustering of multivariate data using Principal Component Analysis and heatmap. Nucleic Acids Res 43:W566–W570 Myronidis D, Papageorgiou C, Theophanous S (2016) Landslide susceptibility mapping based on landslide history and analytic hierarchy process (AHP). Nat Hazards 81:245–263. https://doi.org/10.1007/s11069-015-2075-1 Nguyen V, Pham B, Vu T, Prakash I, Jha S, Shahabi H, Shirzadi A, Ba D, Kumar R, Chatterjee J, Bui D (2019) Hybrid Machine Learning Approaches for Landslide Susceptibility Modeling. Forests 10:1–27. https://doi.org/10.3390/f10020157 Nguyen Thi To N, Liu C-C (2019) A new approach using AHP to generate landslide susceptibility maps in the Chen-Yu-Lan Watershed Taiwan. Sensors 19:505. https://doi.org/10.3390/s19030505 Pham BT, Tien Bui D, Prakash I, Dholakia MB (2017) Hybrid integration of Multilayer Perceptron Neural Networks and machine learning ensembles for landslide susceptibility assessment at Himalayan area (India) using GIS. Catena (amst) 149:52–63. https://doi.org/10.1016/j.catena.2016.09.007 Pham BT, Prakash I, Khosravi K, Chapi K, Trinh PT, Ngo TQ, Hosseini SV, Bui DT (2019) A comparison of Support Vector Machines and Bayesian algorithms for landslide susceptibility modelling. Geocarto Int 34:1385–1407. https://doi.org/10.1080/10106049.2018.1489422 Pourghasemi HR, Kornejady A, Kerle N, Shabani F (2020) Investigating the effects of different landslide positioning techniques, landslide partitioning approaches, and presence-absence balances on landslide susceptibility mapping. Catena (Amst) 187:104364. https://doi.org/10.1016/j.catena.2019.104364 Rasigraf O, Wagner D (2022) Landslides: An emerging model for ecosystem and soil chronosequence research. Earth Sci Rev. https://doi.org/10.1016/j.earscirev.2022.104064 Rousseeuw PJ (1987) Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65. https://doi.org/10.1016/0377-0427(87)90125-7 Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323:533–536 Saha S, Roy J, Pradhan B, Hembram TK (2021) Hybrid ensemble machine learning approaches for landslide susceptibility mapping using different sampling ratios at East Sikkim Himalayan, India. Adv Space Res 68:2819–2840. https://doi.org/10.1016/j.asr.2021.05.018 San BT (2014) An evaluation of SVM using polygon-based random sampling in landslide susceptibility mapping: The Candir catchment area (western Antalya, Turkey). Int J Appl Earth Obs Geoinf 26:399–412. https://doi.org/10.1016/j.jag.2013.09.010 Schlögel R, Marchesini I, Alvioli M, Reichenbach P, Rossi M, Malet J-P (2018) Optimizing landslide susceptibility zonation: Effects of DEM spatial resolution and slope unit delineation on logistic regression models. Geomorphology 301:10–20 Shapley LS (1952) A Value for N-Person Games. RAND Corporation, Santa Monica, CA. https://doi.org/10.7249/P0295 Shreve RL (1974) Variation of mainstream length with basin area in river networks. Water Resour Res 10:1167–1177 Singh P, Sharma A, Sur U, Rai PK (2021) Comparative landslide susceptibility assessment using statistical information value and index of entropy model in Bhanupali-Beri region, Himachal Pradesh, India. Environ Dev Sustain 23:5233–5250. https://doi.org/10.1007/s10668-020-00811-0 Steiger JH (1980) Tests for comparing elements of a correlation matrix. Psychol Bull 87:245–251 Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J Roy Stat Soc: Ser B (Methodol) 36:111–133 Štrumbelj E, Kononenko I (2014) Explaining prediction models and individual predictions with feature contributions. Knowl Inf Syst 41:647–665 Sun D, Wen H, Wang D, Xu J (2020) A random forest model of landslide susceptibility mapping based on hyperparameter optimization using Bayes algorithm. Geomorphology 362:107201. https://doi.org/10.1016/j.geomorph.2020.107201 Sun D, Xu J, Wen H, Wang D (2021) Assessment of landslide susceptibility mapping based on Bayesian hyperparameter optimization: A comparison between logistic regression and random forest. Eng Geol 281:105972. https://doi.org/10.1016/j.enggeo.2020.105972 Sun D, Gu Q, Wen H, Xu J, Zhang Y, Shi S, Xue M, Zhou X (2022) Assessment of landslide susceptibility along mountain highways based on different machine learning algorithms and mapping units by hybrid factors screening and sample optimization. Gondwana Res. https://doi.org/10.1016/j.gr.2022.07.013 Tanyu BF, Abbaspour A, Alimohammadlou Y, Tecuci G (2021) Landslide susceptibility analyses using Random Forest, C4.5, and C5.0 with balanced and unbalanced datasets. Catena (Amst) 203:105355. https://doi.org/10.1016/j.catena.2021.105355 Tien Bui D, Nguyen QP, Hoang N-D, Klempe H (2017) A novel fuzzy K-nearest neighbor inference model with differential evolution for spatial prediction of rainfall-induced shallow landslides in a tropical hilly area using GIS. Landslides 14:1–17. https://doi.org/10.1007/s10346-016-0708-4 Tsangaratos P, Ilia I (2016) Comparison of a logistic regression and Naïve Bayes classifier in landslide susceptibility assessments: The influence of models complexity and training dataset size. Catena (Amst) 145:164–179. https://doi.org/10.1016/j.catena.2016.06.004 Wang L-J, Sawada K, Moriguchi S (2013) Landslide susceptibility analysis with logistic regression model based on FCM sampling strategy. Comput Geosci 57:81–92 Xi C, Han M, Hu X, Liu B, He K, Luo G, Cao X (2022) Effectiveness of Newmark-based sampling strategy for coseismic landslide susceptibility mapping using deep learning, support vector machine, and logistic regression. Bull Eng Geol Env 81:174. https://doi.org/10.1007/s10064-022-02664-5 Yang C, Liu L-L, Huang F, Huang L, Wang X-M (2023) Machine learning-based landslide susceptibility assessment with optimized ratio of landslide to non-landslide samples. Gondwana Res 123:198–216. https://doi.org/10.1016/j.gr.2022.05.012 Zare M, Pourghasemi HR, Vafakhah M, Pradhan B (2013) Landslide susceptibility mapping at Vaz Watershed (Iran) using an artificial neural network model: a comparison between multilayer perceptron (MLP) and radial basic function (RBF) algorithms. Arab J Geosci 6:2873–2888. https://doi.org/10.1007/s12517-012-0610-x Zhang W, Wu C, Tang L, Gu X, Wang L (2023) Efficient time-variant reliability analysis of Bazimen landslide in the Three Gorges Reservoir Area using XGBoost and LightGBM algorithms. Gondwana Res 123:41–53. https://doi.org/10.1016/j.gr.2022.10.004 Zhao B, Ge Y, Chen H (2021) Landslide susceptibility assessment for a transmission line in Gansu Province, China by using a hybrid approach of fractal theory, information value, and random forest models. Environ Earth Sci 80:441. https://doi.org/10.1007/s12665-021-09737-w Zhou H, Gao J (2014) Automatic Method for Determining Cluster Number Based on Silhouette Coefficient. Adv Mat Res 951:227–230. https://doi.org/10.4028/www.scientific.net/AMR.951.227 Zhou C, Yin K, Cao Y, Ahmed B, Li Y, Catani F, Pourghasemi HR (2018) Landslide susceptibility modeling applying machine learning methods: A case study from Longju in the Three Gorges Reservoir area, China. Comput Geosci 112:23–37