A multi-factor two-stage deep integration model for stock price prediction based on intelligent optimization and feature clustering
Tóm tắt
Stock market fluctuations have a great impact on various economic and financial activities worldwide. Accurate prediction of stock prices plays a decisive role in constructing the investment decision or risk hedging. However, accurate prediction of the stock price is a thorny task, because stock price fluctuations are non-linear and chaotic. In order to promote the accuracy of stock price prediction, a multi-factor two-stage deep learning integrated prediction system based on intelligent optimization and feature clustering is proposed to predict stock price in this paper. Firstly, a multi-factor analysis is carried out to select a variety of factors that have an impact on the stock price, and adopt the extreme gradient boosting (XGBoost) algorithm to eliminate factors with low correlation. The second step is to apply the idea of classification prediction to cluster the filtered feature set. Further, multiple parameters of long short-term memory (LSTM) are optimized by genetic algorithm (GA), and multiple GA-LSTM models are obtained by training each clustering result. Finally, the results of each class predicted by the GA-LSTM model are nonlinearly integrated to acquire the final prediction model, which is applied to the prediction of the test set. The experimental results indicate that the performance of the proposed model outperforms other baseline models in China's two stock markets and the New York stock exchange. At the same time, these results fully prove that the prediction model proposed by us possesses more reliable and better predictive ability.
Tài liệu tham khảo
Abbas G, Hammoudeh S, Shahzad SJH, Wang SY, Weie YJ (2019) Return and volatility connectedness between stock markets and macroeconomic factors in the G-7 countries. J Syst Sci Syst Eng 28(1):1–36. https://doi.org/10.1007/s11518-018-5371-y
Bao W, Yue J, Rao YL (2017) A deep learning framework for financial time series using stacked autoencoders and long-short term memory. PLoS ONE 12(7):e0180944. https://doi.org/10.1371/journal.pone.0180944
Bollerslev T (1986) Generalized autoregressive conditional heteroskedasticity. J Econom 31(3):307–327. https://doi.org/10.1016/0304-4076(86)90063-1
Cao GX, Han Y, Li QC, Xu W (2017) Asymmetric MF-DCCA method based on risk conduction and its application in the Chinese and foreign stock markets. Physica A 468:119–130. https://doi.org/10.1016/j.physa.2016.10.002
Cao J, Li Z, Li J (2019) Financial time series forecasting model based on CEEMDAN and LSTM. Physica A 519:127–139. https://doi.org/10.1016/j.physa.2018.11.061
Cao W, Zhu WD, Wang WJ, Demazeau Y, Zhang C (2020) A deep coupled LSTM approach for USD/CNY exchange rate forecasting. IEEE Intell Syst 35(2):43–53. https://doi.org/10.1109/MIS.2020.2977283
Chandar SK (2021) Hybrid models for intraday stock price forecasting based on artificial neural networks and metaheuristic algorithms. Pattern Recogn Lett 147:124–133. https://doi.org/10.1016/j.patrec.2021.03.030
Chen SS (2011) Lack of consumer confidence and stock returns. J Empir Financ 18(2):225–236. https://doi.org/10.1016/j.jempfin.2010.12.004
Chen YJ, Hao YJ (2018) Integrating principle component analysis and weighted support vector machine for stock trading signals prediction. Neurocomputing 321:381–402. https://doi.org/10.1016/j.neucom.2018.08.077
Chen TQ, Guestrin C (2016) XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Doi: https://doi.org/10.1145/2939672.2939785
Chung H, Shin KS (2018) Genetic algorithm-optimized long short-term memory network for stock market prediction. Sustainability 10(10):3765. https://doi.org/10.3390/su10103765
Fang JC, Gozgor G, Lau CKM, Lu Z (2020) The impact of Baidu index sentiment on the volatility of China’s stock markets. Financ Res Lett 32:101099. https://doi.org/10.1016/j.frl.2019.01.011
Fischer T, Krauss C (2018) Deep learning with long short-term memory networks for financial market predictions. Eur J Oper Res 270(2):654–669. https://doi.org/10.1016/j.ejor.2017.11.054
Guegan D (2009) Chaos in economics and finance. Annu Rev Control 33(1):89–93. https://doi.org/10.1016/j.arcontrol.2009.01.002
Henrique BM, Sobreiro VA, Kimura H (2019) Literature review: machine learning techniques applied to financial market prediction. Expert Syst Appl 124:226–251. https://doi.org/10.1016/j.eswa.2019.01.012
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Holland JH (1992) Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. MIT Press, Cambridge
Ji Y, Liew AWC, Yang LX (2021) A novel improved particle swarm optimization with long-short term memory hybrid model for stock indices forecast. IEEE Access 9:23660–23671. https://doi.org/10.1109/ACCESS.2021.3056713
Joo YC, Park SY (2021) The impact of oil price volatility on stock markets: evidences from oil-importing countries. Energy Econom 101:105413. https://doi.org/10.1016/j.eneco.2021.105413
Jujie WANG, Chunchen FENG, Junjie HE, Liu FENG, Yang LI (2020) A novel multi-factor stock index prediction approach using principal component analysis feature classification and two-stage long shortterm memory network with residual correction. Econom Comput Econom Cybernet Stud Res. https://doi.org/10.24818/18423264/54.3.20.06
Li GH, Zheng CF, Yang H (2022) Carbon price combination prediction model based on improved variational mode decomposition. Energy Rep 8:1644–1664. https://doi.org/10.1016/j.egyr.2021.11.270
Lin Y, Yan Y, Xu JL, Liao Y, Ma F (2021) Forecasting stock index price using the CEEMDAN-LSTM model. North Am J Econom Financ 57:101421. https://doi.org/10.1016/j.najef.2021.101421
Lloyd S (1982) Least squares quantization in PCM. IEEE Trans Info Theor 28(2):129–137. https://doi.org/10.1109/TIT.1982.1056489
Nikou M, Mansourfar G, Bagherzadeh J (2019) Stock price prediction using DEEP learning algorithm and its comparison with machine learning algorithms. Intell Sys Acc Fin Mgmt 26(4):164–174. https://doi.org/10.1002/isaf.1459
Ning KF, Liu M, Dong MY (2015) A new robust ELM method based on a Bayesian framework with heavy-tailed distribution and weighted likelihood function. Neurocomputing 149(2):891–903. https://doi.org/10.1016/j.neucom.2014.07.045
Niu HL, Xu KL, Wang WQ (2020) A hybrid stock price index forecasting model based on variational mode decomposition and LSTM network. Appl Intell 50:4296–4309. https://doi.org/10.1007/s10489-020-01814-0
Rojas I, Valenzuela O, Rojas F, Guillen A, Herrera LJ, Pomares H, Marquez L, Pasadas M (2008) Soft-computing techniques and ARMA model for time series prediction. Neurocomputing 71(4–6):519–537. https://doi.org/10.1016/j.neucom.2007.07.018
Thakur M, Kumar D (2018) A hybrid financial trading support system using multi-category classifiers and random forest. Appl Soft Comput 67:337–349. https://doi.org/10.1016/j.asoc.2018.03.006
Wang Y, Guo YK (2020) Forecasting method of stock market volatility in time series data based on mixed model of ARIMA and XGBoost. China Commun 17(3):205–221. https://doi.org/10.23919/JCC.2020.03.017
Wang J, Li Y (2018) Multi-step ahead wind speed prediction based on optimal feature extraction, long short term memory neural network and error correction strategy. Appl Energy 230:429–443. https://doi.org/10.1016/j.apenergy.2018.08.114
Wang J, He J, Feng C, Feng L, Li Y (2021a) Stock index prediction and uncertainty analysis using multi-scale nonlinear ensemble paradigm of optimal feature extraction, two-stage deep learning and Gaussian process regression. Appl Soft Comput 113:107898. https://doi.org/10.1016/j.asoc.2021.107898
Wang J, Sun X, Cheng Q, Cui Q (2021b) An innovative random forest-based nonlinear ensemble paradigm of improved feature extraction and deep learning for carbon price forecasting. Sci Total Environ 762:143099. https://doi.org/10.1016/j.scitotenv.2020.143099
Wang JQ, Tang JL, Guo K (2022a) Green bond index prediction based on CEEMDAN-LSTM. Front Energy Res 9:793413. https://doi.org/10.3389/fenrg.2021.793413
Wang J, Xu W, Dong J, Zhang Y (2022b) Two-stage deep learning hybrid framework based on multi-factor multi-scale and intelligent optimization for air pollutant prediction and early warning. Stoch Environ Res Risk Assess. https://doi.org/10.1007/s00477-022-02202-5
Xiao JH, Wen FH, Zhao YP, Wang X (2021) The role of US implied volatility index in forecasting Chinese stock market volatility: evidence from HAR models. Int Rev Econ Financ 74:311–333. https://doi.org/10.1016/j.iref.2021.03.010
Yamaka W, Maneejuk P (2020) Analyzing the causality and dependence between gold shocks and asian emerging stock markets: a smooth transition copula approach. Mathematics 8(1):120. https://doi.org/10.3390/math8010120
Yang YJ, Yang YM, Zhou W (2021) Research on a hybrid prediction model for stock price based on long short-term memory and variational mode decomposition. Soft Comput 25:13513–13531. https://doi.org/10.1007/s00500-021-06122-4
Zhang CZ, Pan HP, Ma Y, Huang X (2019a) Analysis of Asia Pacific stock markets with a novel multiscale model. Physica A 534:120939. https://doi.org/10.1016/j.physa.2019.04.175
Zhang J, Shao YH, Huang LW, Teng JY, Zhao YT, Yang ZK, Li XY (2019b) Can the exchange rate be used to predict the Shanghai composite index? IEEE Access 8:2188–2199
Zhang HC, Wu Q, Li FY (2022) Application of online multitask learning based on least squares support vector regression in the financial market. Appl Soft Comput 121:108754. https://doi.org/10.1016/j.asoc.2022.108754