A multi-factor two-stage deep integration model for stock price prediction based on intelligent optimization and feature clustering

Artificial Intelligence Review - Tập 56 - Trang 7237-7262 - 2022
Jujie Wang1,2, Shuzhou Zhu1
1School of Management Science and Engineering, Nanjing University of Information Science and Technology, Nanjing, China
2Collaborative Innovation Center on Forecast and Evaluation of Meteorological Disasters, Nanjing University of Information Science and Technology, Nanjing, China

Tóm tắt

Stock market fluctuations have a great impact on various economic and financial activities worldwide. Accurate prediction of stock prices plays a decisive role in constructing the investment decision or risk hedging. However, accurate prediction of the stock price is a thorny task, because stock price fluctuations are non-linear and chaotic. In order to promote the accuracy of stock price prediction, a multi-factor two-stage deep learning integrated prediction system based on intelligent optimization and feature clustering is proposed to predict stock price in this paper. Firstly, a multi-factor analysis is carried out to select a variety of factors that have an impact on the stock price, and adopt the extreme gradient boosting (XGBoost) algorithm to eliminate factors with low correlation. The second step is to apply the idea of classification prediction to cluster the filtered feature set. Further, multiple parameters of long short-term memory (LSTM) are optimized by genetic algorithm (GA), and multiple GA-LSTM models are obtained by training each clustering result. Finally, the results of each class predicted by the GA-LSTM model are nonlinearly integrated to acquire the final prediction model, which is applied to the prediction of the test set. The experimental results indicate that the performance of the proposed model outperforms other baseline models in China's two stock markets and the New York stock exchange. At the same time, these results fully prove that the prediction model proposed by us possesses more reliable and better predictive ability.

Tài liệu tham khảo

Abbas G, Hammoudeh S, Shahzad SJH, Wang SY, Weie YJ (2019) Return and volatility connectedness between stock markets and macroeconomic factors in the G-7 countries. J Syst Sci Syst Eng 28(1):1–36. https://doi.org/10.1007/s11518-018-5371-y Bao W, Yue J, Rao YL (2017) A deep learning framework for financial time series using stacked autoencoders and long-short term memory. PLoS ONE 12(7):e0180944. https://doi.org/10.1371/journal.pone.0180944 Bollerslev T (1986) Generalized autoregressive conditional heteroskedasticity. J Econom 31(3):307–327. https://doi.org/10.1016/0304-4076(86)90063-1 Cao GX, Han Y, Li QC, Xu W (2017) Asymmetric MF-DCCA method based on risk conduction and its application in the Chinese and foreign stock markets. Physica A 468:119–130. https://doi.org/10.1016/j.physa.2016.10.002 Cao J, Li Z, Li J (2019) Financial time series forecasting model based on CEEMDAN and LSTM. Physica A 519:127–139. https://doi.org/10.1016/j.physa.2018.11.061 Cao W, Zhu WD, Wang WJ, Demazeau Y, Zhang C (2020) A deep coupled LSTM approach for USD/CNY exchange rate forecasting. IEEE Intell Syst 35(2):43–53. https://doi.org/10.1109/MIS.2020.2977283 Chandar SK (2021) Hybrid models for intraday stock price forecasting based on artificial neural networks and metaheuristic algorithms. Pattern Recogn Lett 147:124–133. https://doi.org/10.1016/j.patrec.2021.03.030 Chen SS (2011) Lack of consumer confidence and stock returns. J Empir Financ 18(2):225–236. https://doi.org/10.1016/j.jempfin.2010.12.004 Chen YJ, Hao YJ (2018) Integrating principle component analysis and weighted support vector machine for stock trading signals prediction. Neurocomputing 321:381–402. https://doi.org/10.1016/j.neucom.2018.08.077 Chen TQ, Guestrin C (2016) XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Doi: https://doi.org/10.1145/2939672.2939785 Chung H, Shin KS (2018) Genetic algorithm-optimized long short-term memory network for stock market prediction. Sustainability 10(10):3765. https://doi.org/10.3390/su10103765 Fang JC, Gozgor G, Lau CKM, Lu Z (2020) The impact of Baidu index sentiment on the volatility of China’s stock markets. Financ Res Lett 32:101099. https://doi.org/10.1016/j.frl.2019.01.011 Fischer T, Krauss C (2018) Deep learning with long short-term memory networks for financial market predictions. Eur J Oper Res 270(2):654–669. https://doi.org/10.1016/j.ejor.2017.11.054 Guegan D (2009) Chaos in economics and finance. Annu Rev Control 33(1):89–93. https://doi.org/10.1016/j.arcontrol.2009.01.002 Henrique BM, Sobreiro VA, Kimura H (2019) Literature review: machine learning techniques applied to financial market prediction. Expert Syst Appl 124:226–251. https://doi.org/10.1016/j.eswa.2019.01.012 Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735 Holland JH (1992) Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. MIT Press, Cambridge Ji Y, Liew AWC, Yang LX (2021) A novel improved particle swarm optimization with long-short term memory hybrid model for stock indices forecast. IEEE Access 9:23660–23671. https://doi.org/10.1109/ACCESS.2021.3056713 Joo YC, Park SY (2021) The impact of oil price volatility on stock markets: evidences from oil-importing countries. Energy Econom 101:105413. https://doi.org/10.1016/j.eneco.2021.105413 Jujie WANG, Chunchen FENG, Junjie HE, Liu FENG, Yang LI (2020) A novel multi-factor stock index prediction approach using principal component analysis feature classification and two-stage long shortterm memory network with residual correction. Econom Comput Econom Cybernet Stud Res. https://doi.org/10.24818/18423264/54.3.20.06 Li GH, Zheng CF, Yang H (2022) Carbon price combination prediction model based on improved variational mode decomposition. Energy Rep 8:1644–1664. https://doi.org/10.1016/j.egyr.2021.11.270 Lin Y, Yan Y, Xu JL, Liao Y, Ma F (2021) Forecasting stock index price using the CEEMDAN-LSTM model. North Am J Econom Financ 57:101421. https://doi.org/10.1016/j.najef.2021.101421 Lloyd S (1982) Least squares quantization in PCM. IEEE Trans Info Theor 28(2):129–137. https://doi.org/10.1109/TIT.1982.1056489 Nikou M, Mansourfar G, Bagherzadeh J (2019) Stock price prediction using DEEP learning algorithm and its comparison with machine learning algorithms. Intell Sys Acc Fin Mgmt 26(4):164–174. https://doi.org/10.1002/isaf.1459 Ning KF, Liu M, Dong MY (2015) A new robust ELM method based on a Bayesian framework with heavy-tailed distribution and weighted likelihood function. Neurocomputing 149(2):891–903. https://doi.org/10.1016/j.neucom.2014.07.045 Niu HL, Xu KL, Wang WQ (2020) A hybrid stock price index forecasting model based on variational mode decomposition and LSTM network. Appl Intell 50:4296–4309. https://doi.org/10.1007/s10489-020-01814-0 Rojas I, Valenzuela O, Rojas F, Guillen A, Herrera LJ, Pomares H, Marquez L, Pasadas M (2008) Soft-computing techniques and ARMA model for time series prediction. Neurocomputing 71(4–6):519–537. https://doi.org/10.1016/j.neucom.2007.07.018 Thakur M, Kumar D (2018) A hybrid financial trading support system using multi-category classifiers and random forest. Appl Soft Comput 67:337–349. https://doi.org/10.1016/j.asoc.2018.03.006 Wang Y, Guo YK (2020) Forecasting method of stock market volatility in time series data based on mixed model of ARIMA and XGBoost. China Commun 17(3):205–221. https://doi.org/10.23919/JCC.2020.03.017 Wang J, Li Y (2018) Multi-step ahead wind speed prediction based on optimal feature extraction, long short term memory neural network and error correction strategy. Appl Energy 230:429–443. https://doi.org/10.1016/j.apenergy.2018.08.114 Wang J, He J, Feng C, Feng L, Li Y (2021a) Stock index prediction and uncertainty analysis using multi-scale nonlinear ensemble paradigm of optimal feature extraction, two-stage deep learning and Gaussian process regression. Appl Soft Comput 113:107898. https://doi.org/10.1016/j.asoc.2021.107898 Wang J, Sun X, Cheng Q, Cui Q (2021b) An innovative random forest-based nonlinear ensemble paradigm of improved feature extraction and deep learning for carbon price forecasting. Sci Total Environ 762:143099. https://doi.org/10.1016/j.scitotenv.2020.143099 Wang JQ, Tang JL, Guo K (2022a) Green bond index prediction based on CEEMDAN-LSTM. Front Energy Res 9:793413. https://doi.org/10.3389/fenrg.2021.793413 Wang J, Xu W, Dong J, Zhang Y (2022b) Two-stage deep learning hybrid framework based on multi-factor multi-scale and intelligent optimization for air pollutant prediction and early warning. Stoch Environ Res Risk Assess. https://doi.org/10.1007/s00477-022-02202-5 Xiao JH, Wen FH, Zhao YP, Wang X (2021) The role of US implied volatility index in forecasting Chinese stock market volatility: evidence from HAR models. Int Rev Econ Financ 74:311–333. https://doi.org/10.1016/j.iref.2021.03.010 Yamaka W, Maneejuk P (2020) Analyzing the causality and dependence between gold shocks and asian emerging stock markets: a smooth transition copula approach. Mathematics 8(1):120. https://doi.org/10.3390/math8010120 Yang YJ, Yang YM, Zhou W (2021) Research on a hybrid prediction model for stock price based on long short-term memory and variational mode decomposition. Soft Comput 25:13513–13531. https://doi.org/10.1007/s00500-021-06122-4 Zhang CZ, Pan HP, Ma Y, Huang X (2019a) Analysis of Asia Pacific stock markets with a novel multiscale model. Physica A 534:120939. https://doi.org/10.1016/j.physa.2019.04.175 Zhang J, Shao YH, Huang LW, Teng JY, Zhao YT, Yang ZK, Li XY (2019b) Can the exchange rate be used to predict the Shanghai composite index? IEEE Access 8:2188–2199 Zhang HC, Wu Q, Li FY (2022) Application of online multitask learning based on least squares support vector regression in the financial market. Appl Soft Comput 121:108754. https://doi.org/10.1016/j.asoc.2022.108754