Paper:
Stock Prediction Based on News Text Analysis
Wentao Gu, Linghong Zhang, Houjiao Xi, and Suhao Zheng
Department of Statistics, School of Statistics and Mathematics, Zhejiang Gongshang University
18 Xuezheng Street, Xiasha Education Park, Hangzhou, Zhejiang 310018, China
Corresponding author
With the vigorous development of information technology, the textual data of financial news have grown massively, and this ever-rich online news information can influence investors’ decision-making behavior, which affects the stock market. Thus, online news is an important factor affecting market volatility. Quantifying the sentiment of news media and applying it to stock-market prediction has become a popular research topic. In this study, a financial news sentiment lexicon and an auxiliary lexicon applicable to the financial field are constructed, and a sentiment index (SI) is constructed by defining the weight of semantic rules. Then, a comprehensive sentiment index (CSI) is constructed via principal component analysis of the sentiment index and structured stock-market trading data. Finally, these two sentiment indices are added to the generalized autoregressive conditional heteroscedastic (GARCH) and the Long short-term memory (LSTM) models to predict stock returns. The results indicate that the prediction results of LSTM models are better than those of GARCH models. Compared with general-purpose lexicons, the financial lexicons constructed in this study are more stable, and the inclusion of a comprehensive investor sentiment index improves the accuracy of measuring sentiment information. Thus, the proposed lexicons allow more comprehensive measurement of the effects of external sentiment factors on stock-market returns and can improve the prediction effect of stock-return models.
- [1] V. Niederhoffer, “The Analysis of World Events and Stock Prices,” The J. of Business, Vol.44, No.2, pp. 193-219, 1971.
- [2] P. C. Tetlock, “Giving Content to Investor Sentiment: The Role of Media in the Stock Market,” The J. of Finance, Vol.62, No.3, pp. 1139-1168, 2007.
- [3] J. You and J. Wu, “Spiral of Silence: Media Sentiment and the Asset Mispricing,” Economic Research, Vol.47, pp. 141-152, 2012.
- [4] X.-X. Cheng, “Contextual Effect or Content Effect? Empirical Test of Financial News and Online Public Opinion on Stock Market Quotation,” Statistics & Information Tribune, Vol.34, pp. 69-75, 2019.
- [5] P. C. Tetlock, “Giving Content to Investor Sentiment: The Role of Media in the Stock Market,” The J. of Finance, Vol.62, No.3, pp. 1139-1168, 2007.
- [6] C. Wang and J. Wu, “Media Tone Investor Sentiment and IPO Pricing,” J. of Financial Research, Vol.9, pp. 174-189, 2015.
- [7] G. Tan, F. Jiang, and D. Zhang, “Progress on Text Sentiment in Financial Markets,” Economics Information, Vol.11, pp. 137-147, 2016.
- [8] J. Bollen, H. Mao, and X. Zeng, “Twitter Mood Predicts the Stock Market,” J. of Computational Science, Vol.2, No.2, pp. 1-8, 2011.
- [9] D. Lin and B. Chan, “A Statistical Study of the Impact of Online Public Opinion on Stock Prices Based on Investor Sentiment,” Southwest University of Finance and Economics, 2013.
- [10] Z. Teng, “Media Sentiment: Information Interpretation and Market Stability,” Financial Regulation Research, Vol.5, pp. 32-53, 2018.
- [11] R. Xiong, E. P. Nichols, and Y. Shen, “Deep Learning Stock Volatility with Google Domestic Trends,” arXiv preprint, arXiv:1512.04916, 2015.
- [12] R. Akita, A. Yoshihara, T. Matsubara et al., “Deep learning for stock prediction using numerical and textual information,” Proc. of the 2016 IEEE/ACIS 15th Int. Conf. on Computer and Information Science,” IEEE, 2016.
- [13] L. Persio and O. Honchar, “Recurrent Neural Networks Approach to the Financial Forecast of Google Assets,” Int. J. of Mathematics and Computers in Simulation, Vol.11, pp. 7-13, 2017.
- [14] W. Chen and G. Xu, “A Study of Stock Market Volatility Prediction Accuracy Based on Deep Learning and Stock Forum Data,” Management World, Vol.34, pp. 180-181, 2018.
- [15] Y. Wu and M. Chen, “Research and application of fine-grained sentiment analysis of financial microblogs,” South China University of Technology, 2018.
- [16] Y.-S. Zang, “Adverbial Studies in Modern Chinese,” Xuelin Press, 2000 (in Chinese).
- [17] W. Antweiler and M. Z. Frank, “Is all That Talk Just Noise? The Information Content of Internet Stock Message Boards,” The J. of Finance, Vol.59, No.3, pp. 1259-1294, 2004.
- [18] Y.-Z. Yao, J.-Q. Wang, and Z.-F. Liu, “Mixed-frequency investor sentiment and stock price behavior,” J. of Management Sciences in China, Vol.21, pp. 104-113, 2018.
- [19] T. Bollerslev, “Generalized Autoregressive Conditional Heteroskedasticity,” J. of Econometrics, Vol.31, No.3, pp. 307-327, 1986.
- [20] M. Sundermeyer, R. Schlüter, and H. Ney, “LSTM Neural Networks for Language Modeling,” 13th Annual Conf. of the Int. Speech Communication Association (INTERSPEECH 2012), pp. 194-197, 2012.
This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.