Stock Market Trend Prediction Based on Text Mining of Corporate Web and Time Series Data
Hoang T. P. Thanh* and Phayung Meesad**
*Department of Information Technology, Faculty of Information Technology, King Mongkut’s University of Technology North Bangkok, 1518 Pracharat Sai 1 Road, Wongsawang, Bangsue, Bangkok, Thailand
**Department of Information Technology Management, Faculty of Information Technology, King Mongkut’s University of Technology North Bangkok, 1518 Pracharat Sai 1 Road, Wongsawang, Bangsue, Bangkok, Thailand
Predicting the behaviors of the stock markets are always an interesting topic for not only financial investors but also scholars and professionals from different fields, because successful prediction can help investors to yield significant profits. Previous researchers have shown the strong correlation between financial news and their impacts to the movements of stock prices. This paper proposes an approach of using time series analysis and text mining techniques to predict daily stock market trends. The research is conducted with the utilization of a database containing stock index prices and news articles collected from Vietnam websites over 3 years from 2010 to 2012. A robust feature selection and a strong machine learning algorithm are able to lift the forecasting accuracy. By combining Linear Support Vector Machine Weight and Support Vector Machine algorithm, this proposed approach can enhance the prediction accuracy significantly above those of related research approaches. The results show that data set represented by 42 features achieves the highest accuracy by using one-against-one Support Vector Machines (up to 75%) and one-against-one method outperforms one-againstall method in almost all case studies.
-  V. Singal, “Beyond the random walk: A guide to stock market anomalies and low-risk investing,” Oxford University Press, New York, 2004.
-  Q. Wen, Z. Yang, Y. Song, and P. Jia, “Automatic stock decision support system based on box theory and SVM algorithm,” Expert Systems with Applications 37, pp. 1015-1022, 2010.
-  R. Tsaih, Y. Hsu, and C. C. Lai, “Forecasting S&P 500 stock index futures with a hybrid AI system,” Decision Support Systems 23, pp. 161-174, 1998.
-  K. Kim and I. Han, “Genetic algorithms approach to feature discretization in artificial neural networks for the prediction of stock price index,” Expert Systems with Applications, pp. 125-132, 2000.
-  S. Nagaya, Z. Chenli, and O. Hasegawa, “A Proposal of Stock Price Predictor Using Associated Memory,” J. of Advanced Computational Intelligence and Intelligent Informatics, Vol.15, No.2, pp. 145-155, 2011.
-  K. Kim, “Financial time series forecasting using support vector machines,” Neurocomputing 55, pp. 307-319, 2003.
-  F. E. H. Tay, and L. Cao, “Application of support vector machines in financial time series forecasting,” Omega, pp. 309-317, 2001.
-  W. Huang, Y. Nakamori, and S. Wang, “Forecasting stock market movement direction with support vector machine,” Computers & Operations Research 32, pp. 2513-2522, 2005.
-  P. Meesad and R. I. Rasel, “Dhaka Stock Exchange Trend Analysis Using Support Vector Regression,” Int. Conf. on Computing and InformationTechnology (IC2IT2013), Vol.209, pp. 135-143, 2013.
-  P. Meesad and T. Srikhacha, “Stock price time series prediction using Neuro-Fuzzy with support vector guideline system,” 9th ACIS Int. Conf. on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, SNPD 2008 and 2nd Int. Workshop on Advanced Internet Technology and Applications, pp. 422-427, 2008.
-  Anny Ng, Ada Wai-chee Fu, “Mining Frequent Episodes for Relating Financial Events and Stock Trends,” Advances in Knowledge Discovery and Data Mining, pp. 27-39, 2003.
-  B. Wuthrich, V. Cho, S. Leung, D. Permunetilleke, K. Sankaran, J. Zhang, and W. Lam, “Daily Stock Market Forecast from Textual Web Data,” IEEE Int. Conf. on Systems, Man, and Cybernetics, San Diego, CA, pp. 2720-2725, 1998.
-  M. Mittermayer, “Forecasting Intraday Stock Price Trends with Text Mining Techniques,” Hawaii Int. Conf. on System Sciences, 2004.
-  M. I. Yasef Kaya and M. Elif Karsligil, “Stock Price Prediction Using Financial News Articles,” IEEE Int. Conf. on Information and Financial Engineering, pp. 478-482, 2010.
-  G. Gidofalvi, “Using News Articles to Predict Stock Price Movements,” Department of Computer Science and Engineering, University of California, San Diego, 2001.
-  G. P. C. Fung, J. X. Yu and H. Lu, “The Predicting Power of Textual Information on Financial Markets,” IEEE Intelligent Informatics Bulletin, pp. 1-10, 2005.
-  T. Yu, T. Jan, J. Debenham and S. Simoff, “Classify Unexpected News Impacts to Stock Price by Incorporating Time Series Analysis into Support Vector Machine,” Int. Joint Conf. on Neural Network, pp. 2993-2998, 2006.
-  R. P. Schumaker and H. Chen, “Textual analysis of stock market prediction using breaking financial news,” ACM Trans. on Information Systems 27, 2009.
-  B. Wuthrich, “Probabilistic Knowledge Bases,” IEEE Trans. of Knowledge and Data Engineering, pp. 691-698, 1996.
-  B. Wuthrich, “Discovering Probabilistic Decision Rules,” Int. J. of Intelligent Systems in Accounting Finance and Management, pp. 269-277, 1997.
-  V. Lavrenko, M. Schmill, D. Lawire, P. Ogilvie, D. Jensen, and J. Allan, “Mining of Concurrent Text and Time Series,” Int. Conf. on Knowledge Discovery and Data Mining Workshop on Text Mining, Boston, MA, USA, pp. 37-44, 2000.
-  I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene Selection for Cancer Classication using Support Vector Machines,” Machine Learning, pp. 389-422, 2002.
-  H. Murata, T. Onoda, and S. Yamada, “Comparative Analysis of Relevance for SVM-Based Interactive Document Retrieval,” J. of Advanced Computational Intelligence and Intelligent Informatics, Vol.17, No.2, pp. 149-156, 2013.
-  K. M. Lee, K. S. Hwang, K. M. Lee, S. K. Han, W. H. Jung, and S. Lee, “Supervised Learning-Based Feature Selection for Mondrian Paintings Style Authentication,” J. of Advanced Computational Intelligence and Intelligent Informatics, Vol.16, No.7, pp. 894-899, 2012.
-  J. Brank, M. Grobelnik, N. Milic-frayling, and D. Mladenic, “Feature selection using linear support vector machines,” Int. Conf. on Data Mining Methods and Databases for Engineering, Finance, and Other Fields, 2002.
-  G. Salton, A. Wong, and C. S. Yang, “A Vector Space Model for Automatic Indexing,” Communications of the ACM, pp. 613-620, 1975.
-  V. N. Vapnik, “The Nature of Statistical Learning Theory,” Springer, New York, 1995.
-  K. Crammer and Y. Singer, “On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines,” J. of Machine Learning Research, pp. 265-292, 2001.
-  L. Bottou, C. Cortes, J. S. Denker, H. Drucker, I. Guyon, L. D. Jackel, Y. LeCun, A. Muller, E. Sackinger, P. Simard, and V. Vapnik, “Comparison of Classifier Methods: A Case Study in Handwritten Digit Recognition,” Pattern Recognition, pp. 77-82, 1994.
-  U. Kreßel, “Pairwise classification and support vector machines,” Advances in Kernel Methods, MIT Press Cambridge, USA, pp. 255-268, 1999.
-  J. C. Platt, N. Cristianini, and J. Shawe, “Large margin DAGs for multiclass classification,” Advances in Neural Information Processing Systems, pp. 547-553, 2000.
-  T. G. Dietterich and G. Bakiri, “Solving multiclass learning problems via error-correcting output codes,” J. of Artificial Intelligence Research 2, pp. 263-286, 1995.