Influence of Data Splitting on Performance of Machine Learning Models in Prediction of Shear Strength of Soil

Mathematical Problems in Engineering - Tập 2021 - Trang 1-15 - 2021
Quang Hung Nguyen1, Hai‐Bang Ly2, Lanh Si Ho3,2, Nadhir Al‐Ansari4, Hiep Van Le5, Van Quan Tran2, Indra Prakash6, Binh Thai Pham2
1Thuyloi University, Hanoi 100000, Vietnam
2University of Transport Technology, Hanoi 100000, Vietnam
3Civil and Environmental Engineering Program, Graduate School of Advanced Science and Engineering, Hiroshima University, 1-4-1, Kagamiyama, Higashi-Hiroshima, Hiroshima 739-8527, Japan
4Department of Civil Environmental and Natural Resources Engineering, Lulea University of Technology, 971 87, Lulea, Sweden
5Institute of Research and Development, Duy Tan University, Da Nang 550000, Vietnam
6Bhaskaracharya Institute for Space Applications and Geo-Informatics (BISAG), Gandhinagar 382002, India

Tóm tắt

The main objective of this study is to evaluate and compare the performance of different machine learning (ML) algorithms, namely, Artificial Neural Network (ANN), Extreme Learning Machine (ELM), and Boosting Trees (Boosted) algorithms, considering the influence of various training to testing ratios in predicting the soil shear strength, one of the most critical geotechnical engineering properties in civil engineering design and construction. For this aim, a database of 538 soil samples collected from the Long Phu 1 power plant project, Vietnam, was utilized to generate the datasets for the modeling process. Different ratios (i.e., 10/90, 20/80, 30/70, 40/60, 50/50, 60/40, 70/30, 80/20, and 90/10) were used to divide the datasets into the training and testing datasets for the performance assessment of models. Popular statistical indicators, such as Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Correlation Coefficient (R), were employed to evaluate the predictive capability of the models under different training and testing ratios. Besides, Monte Carlo simulation was simultaneously carried out to evaluate the performance of the proposed models, taking into account the random sampling effect. The results showed that although all three ML models performed well, the ANN was the most accurate and statistically stable model after 1000 Monte Carlo simulations (Mean R = 0.9348) compared with other models such as Boosted (Mean R = 0.9192) and ELM (Mean R = 0.8703). Investigation on the performance of the models showed that the predictive capability of the ML models was greatly affected by the training/testing ratios, where the 70/30 one presented the best performance of the models. Concisely, the results presented herein showed an effective manner in selecting the appropriate ratios of datasets and the best ML model to predict the soil shear strength accurately, which would be helpful in the design and engineering phases of construction projects.

Từ khóa


Tài liệu tham khảo

B. M. Das, 2013, Principles of Geotechnical Engineering

10.1007/s10706-008-9228-x

10.1061/(asce)gt.1943-5606.0000016

10.1007/s13369-014-1022-x

M. Cha, 2007, Shear strength estimation of sandy soils using shear wave velocity, Geotechnical Testing Journal, 30, 484, 10.1520/GTJ100011

E. A. Garven, Evaluation of empirical procedures for predicting the shear strength of unsaturated soils, 2570

10.1139/t10-007

J. O. Ohu, 1986, Shear strength prediction of compacted soils with varying added organic matter contents, Transactions of the ASAE, 29, 351, 10.13031/2013.30151

10.1061/(asce)1090-0241(2005)131:9(1139)

10.1139/t06-055

10.1016/j.enggeo.2011.06.003

10.1680/geot.1985.35.1.3

10.1139/t98-102

10.1016/j.compgeo.2004.08.001

10.1061/(asce)1090-0241(2004)130:3(264)

10.1016/s0378-4371(02)01331-6

10.1016/j.soildyn.2015.03.023

10.1061/(asce)1090-0241(2008)134:9(1272)

S. G. Wright, 2005, Evaluation of Soil Shear Strengths for Slope and Retaining Wall Stability Analyses with Emphasis on High Plasticity Clays

10.3390/su12072709

10.3390/ma13051072

10.3390/su12030830

T.-T. Le, 2020, Development of 48-hour precipitation forecasting model using nonlinear autoregressive neural network, CIGOS 2019, Innovation for Sustainable Infrastructure, 1191

10.3390/s19224941

10.3390/app9183841

10.3390/app9245458

10.1016/j.atmosres.2020.104845

10.1016/j.scitotenv.2019.134979

10.3390/ijerph17144933

10.1007/s11053-019-09465-w

10.1016/j.jenvman.2019.06.102

10.1080/10106049.2018.1499820

10.3390/su11164386

10.3390/app9142824

10.1080/10106049.2018.1489422

10.1007/s10064-017-1202-5

10.1080/10106049.2019.1665715

10.1080/10106049.2020.1737972

10.3390/f11080830

10.3390/f11040421

10.3390/sym12030325

10.2174/1874836801913010178

10.1016/j.catena.2018.04.004

10.1016/j.catena.2018.10.004

10.1016/j.scitotenv.2019.05.061

10.2174/1874836802014010041

10.3390/su12062218

D. T. Bui, 2019, A swarm intelligence-based machine learning approach for predicting soil shear strength for road construction: a case study at Trung Luong National Expressway Project (Vietnam), Engineering with Computers, 35, 955, 10.1007/s00366-018-0643-1

10.1016/j.measurement.2020.107576

10.3390/app9214643

10.3390/app9224738

10.1007/s00366-019-00718-z

10.2478/s13533-011-0043-1

10.1080/00380768.2012.661078

10.1016/j.geoderma.2015.11.014

D. T. Bui, 2012, Landslide susceptibility mapping at Hoa Binh province (Vietnam) using an adaptive neuro-fuzzy inference system and GIS, Computers & Geosciences, 45, 199, 10.1016/j.cageo.2011.10.031

10.1016/j.scitotenv.2018.01.124

10.1016/j.enggeo.2017.04.013

10.1007/s00704-015-1702-9

10.1080/20964471.2018.1472392

10.1016/j.geomorph.2016.03.023

10.1016/j.jclepro.2018.02.154

10.1016/j.soildyn.2020.106390

10.3390/ma13173902

D. J. Armaghani, 2020, A comparative study of ANN and ANFIS models for the prediction of cement-based mortar materials compressive strength, Neural Computing and Applications, 1

P. G. Asteris, 2020, A novel heuristic algorithm for the modeling and risk assessment of the COVID-19 pandemic phenomenon, Computer Modeling in Engineering & Sciences, 125, 815, 10.32604/cmes.2020.013280

D. J. Armaghani, 2020, Application of group method of data handling technique in assessing deformation of rock mass, Applied Metaheuristic Computing, 1, 1

10.3390/su12062229

J. Qiu, 2016, A survey of machine learning for big data processing, EURASIP Journal on Advances in Signal Processing, 2016, 67, 10.1186/s13634-016-0355-x

P. G. Asteris, 2020, On the metaheuristic models for the prediction of cement-metakaolin mortars compressive strength, Metaheuristic Computing and Applications, 1, 063

10.1016/j.cemconres.2020.106167

H.-B. Ly, 2020, Estimation of axial load-carrying capacity of concrete-filled steel tubes using surrogate models, Neural Computing and Applications, 1

10.3390/su12062339

10.3390/su10103376

P. G. Asteris, 2019, Concrete compressive strength using artificial neural networks, Neural Computing and Applications, 1

10.15625/0866-7187/42/3/15008

10.1007/978-981-15-2329-8_22

T.-T. Le, 2020, A robustness analysis of different nonlinear autoregressive networks using Monte Carlo simulations for predicting high fluctuation rainfall, Micro-electronics and Telecommunication Engineering, 205

10.3390/app10051871

10.1016/j.solener.2010.05.009

10.1023/a:1007578321803

M. Zięba, 2016, Ensemble boosted trees with synthetic features generation in application to bankruptcy prediction, Expert Systems with Applications, 58, 93, 10.1016/j.eswa.2016.04.001

10.1111/j.1365-2656.2008.01390.x

10.1016/j.neucom.2005.12.126

10.1109/TSMCB.2011.2168604

10.1016/j.engappai.2020.103971

10.1016/j.catena.2019.04.009

G.-B. Huang, 2011, Extreme learning machine for regression and multiclass classification, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 42, 513, 10.1109/TSMCB.2011.2168604

G.-B. Huang, Extreme learning machine: a new learning scheme of feedforward neural networks, 985

10.15625/0866-7187/42/3/14999

10.3390/ma12101670

10.1016/j.conbuildmat.2015.02.002

10.1002/nme.3338

S. Mordechai, 2012, Applications of Monte Carlo Method in Science and Engineering

10.3390/ma12091544

10.3390/app9214715

10.3390/ma13051205

10.3389/fpls.2016.01419

10.1007/s12524-018-0791-1

C. Verma, Attitude prediction towards ICT and mobile technology for the real-time: an experimental study using machine learning, 247

10.1016/j.catena.2019.104451

10.1016/j.chemosphere.2019.125450

10.3390/w12030683

10.3390/app10072469

10.3390/ijerph17072473

10.1016/j.geoderma.2017.06.020

Z. H. Khan, 2011, Price prediction of share market using artificial neural network (ANN), International Journal of Computer Applications, 22, 42, 10.5120/2552-3497

10.2174/092986712802884259

10.1016/j.jmgm.2005.09.014