DAViS: a unified solution for data collection, analyzation, and visualization in real-time stock market prediction

Suppawong Tuarob1, Poom Wettayakorn1, Ponpat Phetchai1, Siripong Traivijitkhun1, Sunghoon Lim2,3, Thanapon Noraset1, Tipajin Thaipisutikul1
1Faculty of Information and Communication Technology, Mahidol University, Nakhon Pathom, Thailand
2Department of Industrial Engineering, Ulsan National Institute of Science and Technology, Ulsan, Republic of Korea
3Institute for the 4th Industrial Revolution, Ulsan National Institute of Science and Technology, Ulsan, Republic of Korea

Tóm tắt

The explosion of online information with the recent advent of digital technology in information processing, information storing, information sharing, natural language processing, and text mining techniques has enabled stock investors to uncover market movement and volatility from heterogeneous content. For example, a typical stock market investor reads the news, explores market sentiment, and analyzes technical details in order to make a sound decision prior to purchasing or selling a particular company’s stock. However, capturing a dynamic stock market trend is challenging owing to high fluctuation and the non-stationary nature of the stock market. Although existing studies have attempted to enhance stock prediction, few have provided a complete decision-support system for investors to retrieve real-time data from multiple sources and extract insightful information for sound decision-making. To address the above challenge, we propose a unified solution for data collection, analysis, and visualization in real-time stock market prediction to retrieve and process relevant financial data from news articles, social media, and company technical information. We aim to provide not only useful information for stock investors but also meaningful visualization that enables investors to effectively interpret storyline events affecting stock prices. Specifically, we utilize an ensemble stacking of diversified machine-learning-based estimators and innovative contextual feature engineering to predict the next day’s stock prices. Experiment results show that our proposed stock forecasting method outperforms a traditional baseline with an average mean absolute percentage error of 0.93. Our findings confirm that leveraging an ensemble scheme of machine learning methods with contextual information improves stock prediction performance. Finally, our study could be further extended to a wide variety of innovative financial applications that seek to incorporate external insight from contextual information such as large-scale online news articles and social media data.

Từ khóa


Tài liệu tham khảo

Afzali M, Kumar S (2019) Text document clustering: issues and challenges. In 2019 International conference on machine learning, big data, cloud and parallel computing (COMITCon). IEEE, pp 263–268

Akhtar MS, Gupta D, Ekbal A, Bhattacharyya P (2017) Feature selection and ensemble construction: a two-step method for aspect based sentiment analysis. Knowl Based Syst 125(Supplement C):116–135 (ISSN 0950-7051)

Alhassan J, Abdullahi M, Lawal J (2014) Application of artificial neural network to stock forecasting-comparison with ses and arima. J Comput Model 4(2):179–190

Araque O, Corcuera-Platas I, Sánchez-Rada JF, Iglesias CA (2017) Enhancing deep learning sentiment analysis with ensemble techniques in social applications. Exp Syst Appl 77(Supplement C):236–246 (ISSN 0957-4174)

Blei DM, Ng AY, Jordan MI (2003a) Latent dirichlet allocation. J Mach Learn Res 3(Jan):993–1022

Blei DM, Ng AY, Jordan MI (2003b) Latent dirichlet allocation. J Mach Learn Res 3(Jan):993–1022

Bollen J, Mao H, Zeng X (2011) Twitter mood predicts the stock market. J Comput Sci 2(1):1–8 (ISSN 1877-7503)

Bomfim AN (2003) Pre-announcement effects, news effects, and volatility: monetary policy and the stock market. J Bank Finance 27:133–151

Camras L (1981) Emotion: theory, research and experience. Am J Psychol 94(2):370–372 (ISSN 00029556)

Chattupan A, Netisopakul P (2015) Thai stock news sentiment classification using wordpair features. In: The 29th Pacific Asia conference on language, information and computation, pp 188–195

Cheng C, Xu W, Wang J (2012) A comparison of ensemble methods in financial market prediction. In: 2012 Fifth international joint conference on computational sciences and optimization. IEEE, pp 755–759

Colas F, Brazdil P (2006) Comparison of svm and some older classification algorithms in text classification tasks. In IFIP international conference on artificial intelligence in theory and practice. Springer, pp 169–178

Fodor IK (2002) A survey of dimension reduction techniques. Center Appl Sci Comput Lawrence Livermore Natl Lab 9:1–18

Gopinathan R, Durai S (2019) Stock market and macroeconomic variables: new evidence from India. Financ Innov 5:12. https://doi.org/10.1186/s40854-019-0145-1

Hagenau M, Liebmann M, Neumann D (2013) Automated news reading: stock price prediction based on financial news using context-capturing features. Decis Supp Syst 55(3):685–697 (ISSN 0167-9236)

Hu D, Schwabe G, Li X (2015) Systemic risk management and investment analysis with financial network analytics: research opportunities and challenges. Financ Innov 1:12. https://doi.org/10.1186/s40854-015-0001-x

Huang W, Wu Z, Mitra P, Giles CL (2014) Refseer: a citation recommendation system. In IEEE/ACM joint conference on digital libraries. IEEE, pp 371–374

Jin F, Self N, Saraf P, Butler P, Wang W, Ramakrishnan N (2013) Forex-foreteller: currency trend modeling using news articles. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’13. ACM, New York, NY, USA, pp 1470–1473. ISBN 978-1-4503-2174-7

Kou G, Akdeniz ÖO, Dinçer H, Yüksel S (2021) Fintech investments in European banks: a hybrid it2 fuzzy multidimensional decision-making approach. Financ Innov 7(1):1–28

Lertsuksakda R, Netisopakul P, Pasupa K (2014) Thai sentiment terms construction using the hourglass of emotions. In: 2014 6th international conference on knowledge and smart technology (KST), pp 46–50

Li X, Xie H, Chen L, Wang J, Deng X (2014) News impact on stock price return via sentiment analysis. Knowl Based Syst 69(Supplement C):14–23. https://doi.org/10.1016/j.knosys.2014.04.022 (ISSN 0950-7051)

Lim S, Tucker CS (2019) Mining twitter data for causal links between tweets and real-world outcomes. Exp Syst Appl X 3:100007

Liu Z, Huang W, Zheng Y, Sun M (2010) Automatic keyphrase extraction via topic decomposition. In: Proceedings of the 2010 conference on empirical methods in natural language processing, pp 366–376

Manning CD, Raghavan P, Schütze H (2009) Introduction to information retrieval, chapter Stemming and lemmatization (2.2.4), pp 32–34. Cambridge University Press, Cambridge, England

Mao H, Counts S, Bollen J (2011) Predicting financial markets: comparing survey, news, twitter and search engine data. arXiv preprint arXiv:1112.1051

Nassirtoussi AK, Aghabozorgi S, Wah TY, Ngo DCL (2015) Text mining of news-headlines for forex market prediction: a multi-layer dimension reduction algorithm with semantics and sentiment. Exp Syst Appl 42(1):306–324 (ISSN 0957-4174)

Nayak RK, Mishra D, Rath AK (2015) A naïve svm-knn based stock market trend reversal analysis for Indian benchmark indices. Appl Soft Comput 35:670–680

Nguyen TH, Shirai K, Velcin J (2015) Sentiment analysis on social media for stock movement prediction. Exp Syst Appl 42(24):9603–9611 (ISSN 0957-4174)

Noraset T, Lowphansirikul L, Tuarob S (2021) Wabiqa: a wikipedia-based thai question-answering system. Inf Process Manag 58(1):102431

Nti IK, Adekoya AF, Weyori BA (2020) Efficient stock-market prediction using ensemble support vector machine. Open Comput Sci 10(1):153–163. https://doi.org/10.1515/comp-2020-0199

Picek S, Heuser A, Jovic A, Bhasin S, Regazzoni F (2019) The curse of class imbalance and conflicting metrics with machine learning for side-channel evaluations. IACR Trans Cryptogr Hardware Embed Syst 2019(1):1–29

Schumaker RP, Zhang Y, Huang C-N, Chen H (2012) Evaluating sentiment in financial news articles. Decis Supp Syst 53(3):458–464 (ISSN 0167-9236)

Seker SE, Mert C, Al-Naami K, Ayan U, Ozalp N (2013) Ensemble classification over stock market time series and economy news. In: 2013 IEEE international conference on intelligence and security informatics. IEEE, pp 272–273

Selvamuthu D, Kumar V, Mishra A (2019) Indian stock market prediction using artificial neural networks on tick data. Financ Innov 5:12. https://doi.org/10.1186/s40854-019-0131-7

Stoean C, Paja W, Stoean R, Sandita A (2019) Deep architectures for long-term stock price prediction with a heuristic-based strategy for trading simulations. PLoS ONE 14(10):e0223593

Tuarob S, Mitrpanont JL (2017) Automatic discovery of abusive thai language usages in social networks. In: International conference on Asian digital libraries. Springer, pp 267–278

Tuarob S, Chu W, Chen D, Tucker C (2015) Twittdict: extracting social oriented keyphrase semantics from twitter. In: Association for computational linguistics (ACL), pp 25–31, 01

Tuarob S, Assavakamhaenghan N, Tanaphantaruk W, Suwanworaboon P, Hassan S-U, Choetkiertikul M (2021) Automatic team recommendation for collaborative software development. Empir Software Eng 26(4):1–53

Vu TT, Chang S, Ha QT, Collier N (2012) An experiment in integrating sentiment features for tech stock prediction in twitter. In: Proceedings of the workshop on information extraction and entity analytics on social media data. Mumbai, pp 23–38

Wen F, Xu L, Ouyang G, Kou G (2019) Retail investor attention and stock price crash risk: evidence from China. Int Rev Financ Anal 65:101376

Wu W, Chen J, Xu L, He Q, Tindall M (2019) A statistical learning approach for stock selection in the Chinese stock market. Financ Innov 5:12. https://doi.org/10.1186/s40854-019-0137-1

Zha Q, Kou G, Zhang H, Liang H, Chen X, Li C-C, Dong Y (2021) Opinion dynamics in finance and business: a literature review and research opportunities. Financ Innov 6(1):1–22

Zhong X, Enke D (2019a) Predicting the daily return direction of the stock market using hybrid machine learning algorithms. Financ Innov 5:12. https://doi.org/10.1186/s40854-019-0138-0

Zhong X, Enke D (2019b) Predicting the daily return direction of the stock market using hybrid machine learning algorithms. Financ Innov 5:12. https://doi.org/10.1186/s40854-019-0138-0