Finding Communities Using User Preference in Web Structure Mining
Takeshi Yoshikawa and Hidetoshi Nonaka
Graduate School of Information Science and Technology, Hokkaido University, Kita 14, Nishi 9, Kita-ku, Sapporo 060-0814, Japan
Web structure mining is the method based on the graph structure of hyperlinks, and it does not use the information of web contents. HITS algorithm and PageRank algorithm are popular methods for web structure mining. In this study, we deal with the finding algorithm of web communities in web structure mining. This algorithm receives some URLs of user’s known web pages, and proposes to the user the candidates of pages in the web community by using the structure of bipartite graph. We investigate the effect of introduction of user preference to each known pages, and discuss the way to improve the finding algorithm of web communities.
-  J. M. Kleinberg, R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins, “The Web as a Graph; Measurements, Models, and Methods,” Proc. of the 5th Annual Int. Conf. on Computing and Combinatorics, Lecture Notes in Computer Science, Vol.1627, 1999.
-  L. Page, S. Brin, R. Motwani, and T.Winograd, “The Page Citation Ranking: Bringing Order to the Web,” Technical Report, Stanford University, 1998.
-  Z. Gyongyi, H. Garcia-Molina, and J. Pedersen, “Combating Web Spam with TrustRank,” Technical Report, Stanford University, 2004.
-  H. Shimizu, “A Study on Web Mining Based on Web Communities,” Graduation Thesis, Hokkaido University, 2006. (in Japanese)
-  T. Murata, “Finding Related Web Pages Based on Connectivity Information from a Search Engine,” Poster Proc. of the Tenth Int. World Wide Web Conf. (WWW10), 2001.
-  K. Eguchi, K. Oyama, A. Aizawa, and H. Ishikawa, “Overview of the Informational Retrieval Task at NTCIR-4 WEB,” Proc. of the Fourth NTCIRWorkshop on Research in Information Access Technologies Information Retrieval, Question Answering and Summarization, 2004.
This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.