Enriching Representation and Enhancing Nearest Neighbor Classification of Slope/Landslide Data Using Rectified Feature Line Segments and Hypersphere-Based Scaling: A Reproducible Experimental Comparison

Mathematical Geosciences - Tập 55 - Trang 1125-1145 - 2023
Y. M. Ospina-Dávila1, Mauricio Orozco-Alzate2
1Departamento de Ingeniería Eléctrica, Electrónica y Computación, Universidad Nacional de Colombia – Sede Manizales, Manizales, Colombia
2Departamento de Informática y Computación, Universidad Nacional de Colombia – Sede Manizales, Manizales, Colombia

Tóm tắt

Measuring geotechnical and natural hazard engineering features, along with pattern recognition algorithms, allows us to categorize the stability of slopes into two main classes of interest: stable or at risk of collapse. The problem of slope stability can be further generalized to that of assessing landslide susceptibility. Many different methods have been applied to these problems, ranging from simple to complex, and often with a scarcity of available data. Simple classification methods are preferred for the sake of both parsimony and interpretability, as well as to avoid drawbacks such as overtraining. In this paper, an experimental comparison was carried out for three simple but powerful existing variants of the well-known nearest neighbor rule for classifying slope/landslide data. One of the variants enhances the representational capacity of the data using so-called feature line segments, while all three consider the concept of a territorial hypersphere per prototype feature point. Additionally, this experimental comparison is entirely reproducible, as Python implementations are provided for all the methods and the main simulation, and the experiments are performed using three publicly available datasets: two related to slope stability and one for landslide susceptibility. Results show that the three variants are very competitive and easily applicable.

Tài liệu tham khảo

Achour Y, Pourghasemi HR (2020) How do machine learning techniques help in increasing accuracy of landslide susceptibility maps? Geosci Front 11(3):871–883. https://doi.org/10.1016/j.gsf.2019.10.001 Bicego M, Orozco-Alzate M (2021) PowerHC: non linear normalization of distances for advanced nearest neighbor classification. In: 25th International conference on pattern recognition (ICPR), pp 1205–1211. https://doi.org/10.1109/ICPR48806.2021.9413210 Bicego M, Rossetto A, Olivieri M, Londoño-Bonilla JM, Orozco-Alzate M (2022) Advanced KNN approaches for explainable seismic-volcanic signal classification. Math Geosci (in press). https://doi.org/10.1007/s11004-022-10026-w Bramer M (2016) Principles of data mining, 3rd edn. Undergraduate Topics in Computer Science, Springer, Berlin. https://doi.org/10.1007/978-1-4471-7307-6 Cheema MS, Eweiwi A, Bauckhage C (2015) High dimensional low sample size activity recognition using geometric classifiers. Digital Signal Process 42:61–69. https://doi.org/10.1016/j.dsp.2015.03.019 Chen W, Pourghasemi HR, Kornejady A, Zhang N (2017) Landslide spatial modeling: introducing new ensembles of ANN, MaxEnt, and SVM machine learning techniques. Geoderma 305:314–327. https://doi.org/10.1016/j.geoderma.2017.06.020 Chen W, Pourghasemi HR, Panahi M, Kornejady A, Wang J, Xie X, Cao S (2017) Spatial prediction of landslide susceptibility using an adaptive neuro-fuzzy inference system combined with frequency ratio, generalized additive model, and support vector machine techniques. Geomorphology 297:69–85. https://doi.org/10.1016/j.geomorph.2017.09.007 Cheng MY, Hoang ND (2015) Typhoon-induced slope collapse assessment using a novel bee colony optimized support vector classifier. Nat Hazards 78:1961–1978. https://doi.org/10.1007/s11069-015-1813-8 Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13:21–27. https://doi.org/10.1109/TIT.1967.1053964 Du H, Chen YQ (2007) Rectified nearest feature line segment for pattern classification. Pattern Recognit 40(5):1486–1497. https://doi.org/10.1016/j.patcog.2006.10.021 Duin RP, Bicego M, Orozco-Alzate M, Kim SW, Loog M (2014) Metric learning in dissimilarity space for improved nearest neighbor performance. In: Fränti P, Brown G, Loog M, et al (eds) Structural, syntactic, and statistical pattern recognition. Springer, Berlin, pp 183–192. https://doi.org/10.1007/978-3-662-44415-3_19 Fang Z, Wang Y, Peng L, Hong H (2020) Integration of convolutional neural network and conventional machine learning classifiers for landslide susceptibility mapping. Comput Geosci 139(104):470. https://doi.org/10.1016/j.cageo.2020.104470 Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27(8):861–874. https://doi.org/10.1016/j.patrec.2005.10.010 Fernández-Delgado M, Cernadas E, Barro S, Amorim D (2014) Do we need hundreds of classifiers to solve real world classification problems? J Mach Learn Res 15(90):3133–3181 Harris CR, Millman KJ, Van Der Walt SJ, Gommers R, Virtanen P, Cournapeau D, Wieser E, Taylor J, Berg S, Smith NJ, Kern R (2020) Array programming with NumPy. Nature 585(7825):357–362. https://doi.org/10.1038/s41586-020-2649-2 Hoang ND, Bui DT (2017) Chapter 18: Slope stability evaluation using radial basis function neural network, least squares support vector machines, and extreme learning machine. In: Samui P, Sekhar S, Balas VE (eds) Handbook of neural computation. Academic Press, pp 333–344, https://doi.org/10.1016/B978-0-12-811318-9.00018-1 Hoang ND, Pham AD (2016) Hybrid artificial intelligence approach based on metaheuristic and machine learning for slope stability assessment: a multinational data analysis. Expert Syst Appl 46:60–68. https://doi.org/10.1016/j.eswa.2015.10.020 Huang F, Zhang J, Zhou C, Wang Y, Huang J, Zhu L (2020) A deep learning algorithm using a fully connected sparse autoencoder neural network for landslide susceptibility prediction. Landslides 17(1):217–229. https://doi.org/10.1007/s10346-019-01274-9 Huang Y, Zhao L (2018) Review on landslide susceptibility mapping using support vector machines. CATENA 165:520–529. https://doi.org/10.1016/j.catena.2018.03.003 James G, Witten D, Hastie T, Tibshirani R, James G, Witten D, Hastie T, Tibshirani R (2021) Statistical learning. Springer, US, pp 15–57. https://doi.org/10.1007/978-1-0716-1418-1_2 Japkowicz N, Shah M (2011) Evaluating learning algorithms: a classification perspective. Cambridge University Press, New York. https://doi.org/10.1017/CBO9780511921803 Keogh E (2007) Why the lack of reproducibility is crippling research in data mining and what you can do about it. In: Proceedings of the 8th international workshop on multimedia data mining: (Associated with the ACM SIGKDD 2007). Association for Computing Machinery, New York, NY, USA, MDM ’07, https://doi.org/10.1145/1341920.1341922 Keogh E, Lonardi S, Ratanamahatana CA (2004) Towards parameter-free data mining. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining. association for computing machinery, New York, NY, USA, KDD ’04, pp 206–215, https://doi.org/10.1145/1014052.1014077 Keogh E, Lonardi S, Ratanamahatana CA, Wei L, Lee SH, Handley J (2007) Compression-based data mining of sequential data. Data Min Knowl Disc 14:99–129. https://doi.org/10.1007/s10618-006-0049-3 Korup O, Stolle A (2014) Landslide prediction from machine learning. Geol Today 30(1):26–33. https://doi.org/10.1111/gto.12034 Lee S, Won JS, Jeon SW, Park I, Lee MJ (2015) Spatial landslide hazard prediction using rainfall probability and a logistic regression model. Math Geosci 47(5):565–589. https://doi.org/10.1007/s11004-014-9560-z Li DQ, Zheng D, Cao ZJ, Tang XS, Phoon KK (2016) Response surface methods for slope reliability analysis: review and comparison. Eng Geol 203:3–14. https://doi.org/10.1016/j.enggeo.2015.09.003 Li J, Lu CY (2013) A new decision rule for sparse representation based classification for face recognition. Neurocomputing 116:265–271. https://doi.org/10.1016/j.neucom.2012.04.034 Li S, Wu L, Luo X (2020) A novel method for locating the critical slip surface of a soil slope. Eng Appl Artif Intell 94(103):733. https://doi.org/10.1016/j.engappai.2020.103733 Li SZ, Lu J (1999) Face recognition using the nearest feature line method. IEEE Trans Neural Netw 10(2):439–443. https://doi.org/10.1109/72.750575 Lopes N, Ribeiro B (2015) Incremental hypersphere classifier (IHC). In: Machine learning for adaptive many-core machines: a practical approach, studies in big data, vol. 7. Springer, Cham, chap 6, pp 107–123. https://doi.org/10.1007/978-3-319-06938-8_6 Ma Z, Mei G, Piccialli F (2021) Machine learning for landslides prevention: a survey. Neural Comput Appl 33(17):10881–10907. https://doi.org/10.1007/s00521-020-05529-8 Micheletti N, Foresti L, Robert S, Leuenberger M, Pedrazzini A, Jaboyedoff M, Kanevski M (2014) Machine learning feature selection methods for landslide susceptibility mapping. Math Geosci 46(1):33–57. https://doi.org/10.1007/s11004-013-9511-0 Mohri M, Rostamizadeh A, Talwalkar A (2018) Foundations of machine learning, 2nd edn. MIT Press, Cambridge Murdoch WJ, Singh C, Kumbier K, Abbasi-Asl R, Yu B (2019) Definitions, methods, and applications in interpretable machine learning. Proc Natl Acad Sci 116(44):22071–22080. https://doi.org/10.1073/pnas.1900654116 Orozco-Alzate M, Baldo S, Bicego M (2019) Relation, transition and comparison between the adaptive nearest neighbor rule and the hypersphere classifier. In: Ricci E, Rota Bulò S, Snoek C, et al (eds) Image analysis and processing – ICIAP 2019. Springer, Cham, pp 141–151. https://doi.org/10.1007/978-3-030-30642-7_13 Ospina-Dávila YM, Orozco-Alzate M (2020) Parsimonious design of pattern recognition systems for slope stability analysis. Earth Sci Inf 13(2):523–536. https://doi.org/10.1007/s12145-019-00429-5 Pandit B, Babu GLS (2018) Reliability-based robust design for reinforcement of jointed rock slope. Georisk: Assessment Manag Risk Eng Syst Geohazards 12(2):152–168. https://doi.org/10.1080/17499518.2017.1407800 Pȩkalska E, Duin RP (2002) Dissimilarity representations allow for building good classifiers. Pattern Recognit Lett 23(8):943–956. https://doi.org/10.1016/S0167-8655(02)00024-7 Pȩkalska E, Duin RPW (2008) Beyond traditional kernels: classification in two dissimilarity-based representation spaces. IEEE Trans Syst Man Cybernet Part C (Applications and Reviews) 38(6):729–744. https://doi.org/10.1109/TSMCC.2008.2001687 Phoon KK (2020) The story of statistics in geotechnical engineering. Georisk: Assessment Manag Risk Eng Syst Geohazards 14(1):3–25. https://doi.org/10.1080/17499518.2019.1700423 Phoon KK, Ching J, Shuku T (2021) Challenges in data-driven site characterization. Georisk: Assessment Manag Risk Eng Syst Geohazards 1–13. https://doi.org/10.1080/17499518.2021.1896005 Pourghasemi HR, Rahmati O (2018) Prediction of the landslide susceptibility: which algorithm, which precision? CATENA 162:177–192. https://doi.org/10.1016/j.catena.2017.11.022 Qi C, Tang X (2018) A hybrid ensemble method for improved prediction of slope stability. Int J Numer Anal Meth Geomech 42(15):1823–1839. https://doi.org/10.1002/nag.2834 Reichenbach P, Rossi M, Malamud BD, Mihir M, Guzzetti F (2018) A review of statistically-based landslide susceptibility models. Earth Sci Rev 180:60–91. https://doi.org/10.1016/j.earscirev.2018.03.001 Samui P (2013) Support vector classifier analysis of slope. Geomat Nat Haz Risk 4(1):1–12. https://doi.org/10.1080/19475705.2012.684725 Tang G, Huang J, Sheng D, Sloan SW (2018) Stability analysis of unsaturated soil slopes under random rainfall patterns. Eng Geol 245:322–332. https://doi.org/10.1016/j.enggeo.2018.09.013 Tang XS, Wang JP, Yang W, Li DQ (2018) Joint probability modeling for two debris-flow variables: copula approach. Nat Hazard Rev 19(2):05018004. https://doi.org/10.1061/(ASCE)NH.1527-6996.0000286 Vandewalle P, Kovacevic J, Vetterli M (2009) Reproducible research in signal processing. IEEE Signal Process Mag 26(3):37–47. https://doi.org/10.1109/msp.2009.932122 Wang J, Neskovic P, Cooper LN (2007) Improving nearest neighbor rule with a simple adaptive distance measure. Pattern Recognit Lett 28(2):207–213. https://doi.org/10.1016/j.patrec.2006.07.002 Zheng W, Zhao L, Zou C (2004) Locally nearest neighbor classifiers for pattern classification. Pattern Recognit 37(6):1307–1309. https://doi.org/10.1016/j.patcog.2003.11.004 Zhou J, Li E, Yang S, Wang M, Shi X, Yao S, Mitri HS (2019) Slope stability prediction for circular mode failure using gradient boosting machine approach based on an updated database of case histories. Saf Sci 118:505–518. https://doi.org/10.1016/j.ssci.2019.05.046 Zhou KP, Chen ZQ (2009) Stability prediction of tailing dam slope based on neural network pattern recognition. In: 2009 Second international conference on environmental and computer science, pp 380–383. https://doi.org/10.1109/icecs.2009.55