Một ứng dụng của hồi quy máy học trong việc chọn lựa đặc trưng: nghiên cứu về hiệu suất logistics và thuộc tính kinh tế

Neural Computing and Applications - Tập 34 - Trang 15781-15805 - 2022
Suriyan Jomthanachai1, Wai Peng Wong2, Khai Wah Khaw3
1Faculty of Management Sciences, Prince of Songkla University, PSU, Songkhla, Thailand
2School of Information Technology, Monash University, Subang Jaya, Malaysia
3School of Management, Universiti Sains Malaysia, USM, George Town, Malaysia

Tóm tắt

Nghiên cứu này trình bày cách thức thu lợi từ dữ liệu kinh tế lớn động thời gian thực, qua đó đóng góp vào việc chọn lựa các thuộc tính kinh tế phản ánh hiệu suất logistics như được thể hiện qua Chỉ số Hiệu suất Logistics (LPI). Kỹ thuật phân tích sử dụng mức độ năng suất cao trong học máy (ML) cho việc dự đoán hoặc hồi quy bằng cách sử dụng các đặc trưng kinh tế phù hợp. Mục tiêu của nghiên cứu này là xác định bộ sưu tập lý tưởng các thuộc tính kinh tế tốt nhất mô tả một biến số dự đoán cụ thể cho việc dự đoán hiệu suất logistics của một quốc gia. Ngoài ra, một số thuật toán hồi quy ML tiềm năng có thể được sử dụng để tối ưu hóa độ chính xác của dự đoán. Việc chọn lựa đặc trưng sử dụng các kỹ thuật lọc từ tương quan và phân tích các thành phần chính (PCA), cũng như kỹ thuật nhúng của hồi quy LASSO và hồi quy Elastic-net. Sau đó, dựa trên các đặc trưng đã chọn, các phương pháp hồi quy ML như mạng nơ-ron nhân tạo (ANN), perceptron đa lớp (MLP), hồi quy vector hỗ trợ (SVR), hồi quy rừng ngẫu nhiên (RFR), và hồi quy Ridge được sử dụng để huấn luyện và xác thực tập dữ liệu. Các phát hiện cho thấy rằng các tập hợp đặc trưng PCA và Elastic-net cung cấp hiệu suất gần nhất với tiêu chuẩn đo lường sai số. Một quy trình hợp nhất và giao nhau của bộ đặc trưng chấp nhận được được sử dụng để đưa ra quyết định chính xác hơn. Cuối cùng, hợp nhất các tập hợp đặc trưng mang lại kết quả tốt nhất. Các kết quả cho thấy các thuật toán ML có khả năng hỗ trợ trong việc chọn lựa một bộ các yếu tố kinh tế thích hợp phản ánh hiệu suất logistics của một quốc gia. Hơn nữa, ANN đã cho thấy là mô hình dự đoán hiệu quả nhất trong nghiên cứu này.

Từ khóa

#học máy #hồi quy #hiệu suất logistics #thuộc tính kinh tế #Chỉ số Hiệu suất Logistics

Tài liệu tham khảo

World Bank (2018) Connecting to Compete 2018 Trade Logistics in the Global Economy The Logistics Performance Index and Its Indicators. http://hdl.handle.net/10986/29971. Accessed 31 August 2021 Gerschberger M, Manuj I, Freinberger PP (2017) Investigating supplier-induced complexity in supply chains. Int J of Phys Distrib Logist Manag 47(8):688–711 Wong WP, Tang CF (2018) The major determinants of logistic performance in a global perspective: evidence from panel data analysis. Int J of Logist Res Appl 21(4):431–443 D’Aleo V, Sergi BS (2017) Does logistics influence economic growth? European Exp Manag Decis 55(8):1613–1628 Takele TB (2019) The relevance of coordinated regional trade logistics for the implementation of regional free trade area of Africa. JTSCM 13(1):1–11 Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28 Vieira SM, Sousa JM, Runkler TA (2010) Two cooperative ant colonies for feature selection using fuzzy models. Expert Syst Appl 37(4):2714–2723 Muthukrishnan R, Rohini R (2016) LASSO: A feature selection technique in predictive modeling for machine learning. In: Proceeding of the 2016 IEEE international conference on advances in computer applications (ICACA), pp. 18–20 Khmaissia F et al (2018) Accelerating band gap prediction for solar materials using feature selection and regression techniques. Comput Mater Sci 147:304–315 Sikora R, Piramuthu S (2007) Framework for efficient feature selection in genetic algorithm based data mining. Eur J Oper Res 180(2):723–737 Lu M (2019) Embedded feature selection accounting for unknown data heterogeneity. Expert Syst Appl 119:350–361 Lal TN et al (2006) Embedded methods, in Feature extraction. Springer, pp 137–165. Jiang S et al (2017) Modified genetic algorithm-based feature selection combined with pre-trained deep neural network for demand forecasting in outpatient department. Expert Syst Appl 82:216–230 Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A (2015) Recent advances and emerging challenges of feature selection in the context of big data. Knowl Based Syst 86:33–45 Henrique BM, Sobreiro VA, Kimura H (2019) Literature review: machine learning techniques applied to financial market prediction. Expert Syst Appl 124:226–251 Bayram S et al (2016) Comparison of multilayer perceptron (MLP) and radial basis function (RBF) for construction cost estimation: the case of Turkey. J Civ Eng Manag 22(4):480–490 Zarei FA, Baghban A (2017) Phase behavior modelling of asphaltene precipitation utilizing MLP-ANN approach. Pet Sci Technol 35(20):2009–2015 Luna A et al (2014) Prediction of ozone concentration in tropospheric levels using artificial neural networks and support vector machine at Rio de Janeiro, Brazil. Atmos Environ 98:98–104 Vaughan N et al (2014) Parametric model of human body shape and ligaments for patient-specific epidural simulation. Artif Intell Med 62(2):129–140 Coskuner G et al (2021) Application of artificial intelligence neural network modeling to predict the generation of domestic, commercial and construction wastes. Waste Manag Res 39(3):499–507 Jahn M (2020) Artificial neural network regression models in a panel setting: Predicting economic growth. Econ Model 91:148–154 Tümer AE, Akkuş A (2018) Forecasting gross domestic product per capita using artificial neural networks with non-economical parameters. Phys A: Stat Mech Appl 512:468–473 Ballestar MT, Grau-Carles PP, Sainz J (2019) Predicting customer quality in e-commerce social networks: a machine learning approach. Rev Manag Sci 13(3):589–603 Quan Q et al (2020) Research on water temperature prediction based on improved support vector regression. Neural Comput Appl. https://doi.org/10.1007/s00521-020-04836-4 Zhong H et al (2019) Vector field-based support vector regression for building energy consumption prediction. Appl Energy 242:403–414 García-Floriano A et al (2018) Support vector regression for predicting software enhancement effort. Inf Softw Technol 97:99–109 Yao X, Crook J, Andreeva G (2015) Support vector regression for loss given default modelling. Eur J Oper Res 240(2):528–538 Li Y et al (2018) Random forest regression for online capacity estimation of lithium-ion batteries. Appl Energy 232:197–210 Ouedraogo I, Defourny P, Vanclooster M (2019) Application of random forest regression and comparison of its performance to multiple linear regression in modeling groundwater nitrate concentration at the African continent scale. Hydrogeol J 27(3):1081–1098 Liang H et al (2020) GDP spatialization in Ningbo City based on NPP/VIIRS night-time light and auxiliary data using random forest regression. Adv Space Res 65(1):481–493 Bouktif S et al (2018) Optimal deep learning lstm model for electric load forecasting using feature selection and genetic algorithm: Comparison with machine learning approaches. Energies 11(7):1636 Alamoodi A et al (2021) Machine learning-based imputation soft computing approach for large missing scale and non-reference data imputation. Chaos Solit Fractals 151:111236 Cai J et al (2020) Prediction and analysis of net ecosystem carbon exchange based on gradient boosting regression and random forest. Appl Energy 262:114566 Cohen J (1992) Statistical power analysis. Curr Dir Psychol Sci 1(3):98–101 Lawrence S et al (2013) Source apportionment of traffic emissions of particulate matter using tunnel measurements. Atmos Environ 77:548–557 Abimbola O-PP et al (2020) Predicting Escherichia coli loads in cascading dams with machine learning: An integration of hydrometeorology, animal density and grazing pattern. Sci Total Environ 722:137894 Zhang H, Srinivasan R (2021) A biplot-based PCA approach to study the relations between indoor and outdoor air pollutants using case study buildings. Buildings 11(5):218 Das B et al (2018) Evaluation of multiple linear, neural network and penalised regression models for prediction of rice yield based on weather parameters for west coast of India. Int J Biometeorol 62(10):1809–1822 Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Series B Stat Methodol 58(1):267–288 Efron B et al (2004) Least angle regression. Ann Stat 32(2):407–499 Zhang X et al (2014) A causal feature selection algorithm for stock prediction modeling. Neurocomputing 142:48–59 Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Series B Stat Methodol 67(2):301–320 Osisanwo F et al (2017) Supervised machine learning algorithms: classification and comparison. Int J Comput 48(3):128–138 Lima-Junior FR, Carpinetti LC-R (2019) Predicting supply chain performance based on SCOR® metrics and multilayer perceptron neural networks. Int J Prod Econ 212:19–38 Laboissiere LA, Fernandes RA, Lage GG (2015) Maximum and minimum stock price forecasting of Brazilian power distribution companies based on artificial neural networks. Appl Soft Comput 35:66–74 Lahmiri S (2014) Improving forecasting accuracy of the S&P500 intra-day price direction using both wavelet low and high frequency coefficients. Fluct Noise Lett 13(01):1450008 Fath AH, Madanifar F, Abbasi M (2020) Implementation of multilayer perceptron (MLP) and radial basis function (RBF) neural networks to predict solution gas-oil ratio of crude oil systems. Petroleum 6(1):80–91 Heiat A (2002) Comparison of artificial neural network and regression models for estimating software development effort. Inf Softw Technol 44(15):911–922 Moayedi H, Rezaei A (2019) An artificial neural network approach for under-reamed piles subjected to uplift forces in dry sand. Neural Comput Appl 31(2):327–336 Kahani M et al (2018) Development of multilayer perceptron artificial neural network (MLP-ANN) and least square support vector machine (LSSVM) models to predict Nusselt number and pressure drop of TiO2/water nanofluid flows through non-straight pathways. Numer Heat Tr A-Appl 74(4):1190–1206 Zhang F, O'Donnell LJ (2020) Support vector regression, in Machine Learning. Elsevier, pp. 123–140 Ahmad MW, Reynolds J, Rezgui Y (2018) Predictive modelling for solar thermal energy systems: A comparison of support vector regression, random forest, extra trees and regression trees. J Clean Prod 203:810–821 Yuchi W et al (2019) Evaluation of random forest regression and multiple linear regression for predicting indoor fine particulate matter concentrations in a highly polluted city. Environ Pollut 245:746–753 Nandipati SC, XinYing C, Wah KK (2020) Hepatitis C virus (HCV) prediction by machine learning techniques. Appl Model Simul 4:89–100 García-Nieto PJ, García-Gonzalo E, Paredes-Sánchez JP (2021) Prediction of the critical temperature of a superconductor by using the WOA/MARS, ridge, lasso and elastic-net machine learning techniques. Neural Comput Appl 33:17131–17145 Kong X et al (2015) Wind speed prediction using reduced support vector machines with feature selection. Neurocomputing 169:449–456 Başakın EE et al (2021) A new insight to the wind speed forecasting: robust multi-stage ensemble soft computing approach based on pre-processing uncertainty assessment. Neural Comput Appl 34:783–812 Uncuoğlu E, Latifoğlu L, Özer AT (2021) Modelling of lateral effective stress using the particle swarm optimization with machine learning models. Arab J Geosci 14:2441 Lu X et al (2018) Daily pan evaporation modeling from local and cross-station data using three tree-based machine learning models. J Hydrol 566:668–684 Ullah QZ et al (2021) A Cartesian genetic programming based parallel neuroevolutionary model for cloud server’s CPU usage prediction. Electronics 10:67 Guo Y et al (2020) A spatiotemporal thermo guidance based real-time online ride-hailing dispatch framework. IEEE Access 8:115063–115077 Mohammed MS et al (2021) PEW: prediction-based early dark cores wake-up using online ridge regression for many-core systems. IEEE Access 9:124087–124099 Yang ZY et al (2019) Multi-view based integrative analysis of gene expression data for identifying biomarkers. Sci Rep 9:13504 Koç O, Peters J (2019) Learning to serve: an experimental study for a new learning from demonstrations framework. IEEE Robot Autom Lett 4(2):1784–1791 Karaman M (2019) Evaluation of bread wheat genotypes in irrigated and rainfed conditions using biplot analysis. Appl Ecol Environ Res 17(1):1431–1450 Tsai CF, Hsiao YC (2010) Combining multiple feature selection methods for stock prediction: Union, intersection, and multi-intersection approaches. Decis Support Syst 50(1):258–269 Venkatesan D, Kannan K, Saravanan R (2009) A genetic algorithm-based artificial neural network model for the optimization of machining processes. Neural Comput Appl 18(2):135–140 Suryanarayana G et al (2018) Thermal load forecasting in district heating networks using deep learning and advanced feature selection methods. Energy 157:141–149 Citakoglu H (2021) Comparison of multiple learning artificial intelligence models for estimation of long-term monthly temperatures in Turkey. Arab J Geosci 14:2131 Guo J et al (2019) An XGBoost-based physical fitness evaluation model using advanced feature selection and Bayesian hyper-parameter optimization for wearable running monitoring. Comput Netw 151:166–180 Fauvel M, Chanussot J (2009) Benediktsson JA (2009) Kernel principal component analysis for the classification of hyperspectral remote sensing data over urban areas. EURASIP J Adv Signal Process 1:783194 Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A (2013) A review of feature selection methods on synthetic data. Knowl Inf Syst 34(3):483–519 Kwak N, Choi CH (2002) Input feature selection for classification problems. IEEE Trans Neural Netw 13(1):143–159 Syam N, Sharma A (2018) Waiting for a sales renaissance in the fourth industrial revolution: machine learning and artificial intelligence in sales research and practice. Ind Mark Manag 69:135–146 Hundi P, Shahsavari R (2020) Comparative studies among machine learning models for performance estimation and health monitoring of thermal power plants. Appl Energy 265:114775 Huang R et al (2021) Machine learning in natural and engineered water systems. Water Res 205:117666 Zhu R et al (2021) Application of machine learning techniques for predicting the consequences of construction accidents in China. Process Saf Environ Prot 145:293–302 Ahmadi-Nedushan B et al (2006) A review of statistical methods for the evaluation of aquatic habitat suitability for instream flow assessment. River Res Appl 22(5):503–523 Boucher TF et al (2015) A study of machine learning regression methods for major elemental analysis of rocks using laser-induced breakdown spectroscopy. Spectrochim Acta B 107:1–10