Paper:
Recommendation System Using Weighted TF-IDF and Naive Bayes Classifiers on RSS Contents
Incheon Paik* and Hiroshi Mizugai**
*School of Computer Science, University of Aizu, Tsuruga, Ikki-machi, Aizu-Wakamatsu City, Fukushima 965-8580, Japan
**Development Division, Rakuten Co. Ltd., Shinagawa Seaside Rakuten Tower, 4-12-3 Higashishinagawa, Shinagawa-ku, Tokyo 140-0002, Japan
A recent increase in RDF Site Summary (RSS) feeds, used for news updates and blogs, has been caused by the widespread use of blogs. This means that much effort is now needed to search the contents of RSS feeds because of this enormous quantity of material. To solve this problem, recommendation systems enable users to obtain relevant RSS contents easily and quickly. In previous research, an RSS recommendation system was proposed that used the similarity between the Term Frequency (TF) of the RSS contents and the TF derived from the contents of the user’s browsing history for RSS feeds. In this paper, we use Term Frequency-Inverse Document Frequency (TF-IDF) calculations to propose a Weighted TF-IDF method, which focuses on the terms folded by the title tags in RSS contents as characteristic terms. In addition, we propose a new recommendation method, which uses a Naive Bayes classifier in a Machine Learning-based approach. Via experiments, we compare the proposed methods and the existing method in a prototype recommendation system, and we show that the proposed methods outperform the existing method with respect to several evaluation measurements.
- [1] “Amazon.com,”
http://www.amazon.com/ - [2] M. Mukai and M. Aono, “A Prototype of Content-based Recommendation System Based on RSS,” Tech. Rep. 2005-FI-80, IPSJ SIG, 2005.
- [3] I. Androustsopoulos, J. Koutsias, V. Chandrinos, and C. D. Spyropoulos, “An Experimental Comparison of Naive Bayesian and Keyword-based Anti-spam Filtering with Personal E-mail Messages,” Proc. of the 23rd Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval. (SIGIR 00), pp. 160-167, 2000.
- [4] G. Adomavicius and A. Tuzhilin, “Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions,” IEEE Trans. on Knowledge and Data Engineering, Vol.17, No.6, June, pp. 734-749, 2005.
- [5] D. Billsus and M. Pazzani, “Learning Collaborative Information Filters,” Proc. Int. Conf. Machine Learning, 1998.
- [6] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, “Item-Based Collaborative Filtering Recommendation Algorithms,” Proc. 10th Int. WWWConf., 2001.
- [7] M. Pazzani and D. Billsus, “Learning and Revising User Profiles: The Identification of Interesting Web Sites,” Machine Learning, Vol.27, pp. 313-331, 1997.
- [8] R. J. Mooney and L. Roy, “Content-Based Book Recommending Using Learning for Text Categorization,” Proc. ACM SIGIR Conf. 99 Workshop Recommender Systems: Algorithms and Evaluation, 1999.
- [9] S. Puntheeranurak and H. Tsuji, “A Multi-clustering Hybrid Recommender System,” Proc. of CIT 2007, pp. 223-228, 2007.
- [10] S. Moghaddam, M. Jamali, M. Ester, and J. Habibi, “FeebackTrust: Using Feedback Effects in Trust-based Recommendation Systems,” Proc. of RecSys 09, Oct. 23-25, New York, pp. 269-272, 2009.
- [11] N. Oren, “Reexamining tf.idf Based Information Retrieval with Generic Programming,” Proc. of SAICSIT 2002, pp. 224-234, 2002.
- [12] “MeCab: Yet Another Part-of-Speech and Morphological Analyzer,”
http://mecab.sourceforge.net/ - [13] “asahi.com,”
http://www.asahi.com/ - [14] “Mainichi.jp,”
http://mainichi.jp/ - [15] “ceek.jp news,”
http://news.ceek.jp/ - [16] “Japan.internet.com,”
http://japaninternetcom.pheedo.jp/f/japaninternetcom - [17] “Vector,”
http://pheedo.vector.co.jp/f/vector_softnews
This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.