A Method for Accelerating the HITS Algorithm
Andri Mirzal and Masashi Furukawa
Graduate School of Information Science and Technology, Hokkaido University, Kita 14 Nishi 9, Kita-Ku, Sapporo 060-0814, Japan
We present a new method to accelerate the HITS algorithm by exploiting hyperlink structure of the web graph. The proposed algorithm extends the idea of authority and hub scores from HITS by introducing two diagonal matrices which contain constants that act as weights to make authority pages more authoritative and hub pages more hubby. This method works because in the web graph good authorities are pointed to by good hubs and good hubs point to good authorities. Consequently, these pages will collect their scores faster under the proposed algorithm than under the standard HITS. We show that the authority and hub vectors of the proposed algorithm exist but are not necessarily be unique, and then give a treatment to ensure the uniqueness property of the vectors. The experimental results show that the proposed algorithm can improve HITS computations, especially for back button datasets.
-  J.M. Kleinberg, “Authoritative Sources in a Hyperlink Environment,” J. of the ACM Vol.46, pp. 604-632, 1999.
-  L. Page, S. Brin, R. Motwani, and T. Winograd, “The PageRank Citation Ranking: Bringing Order to the Web,” Stanford Digital Libraries Working Paper, 1998.
-  A. N. Langville and C. D. Meyer, “Google’s PageRank and Beyond: The Science of Search Engine Rankings,” Princeton University Press, 2006.
-  K. Bharat and M. R. Henzinger, “Improved Algorithms for Topic Distillation in Hyperlinked Environment,” Proc. 21st Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, ACM Press, pp. 104-111, 1998.
-  T. H. Haveliwala, “Topic Sensitive PageRank: A Context-sensitive Ranking Algorithm for Web Search,” IEEE Trans. on Knowledge and Data Engineering, Vol.15 No.4, pp. 784-796, 2003.
-  B. N. Parlett, “The Symmetric Eigenvalue Problem,” SIAM, 1998.
-  S. D. Kamvar, T. H. Haveliwala, C. D. Manning, and G. H. Golub, “Extrapolation Methods for Accelerating PageRank Computations,” Proc. 12th Int. World Wide Web Conf., pp. 261-270, ACM Press, 2003.
-  S. D. Kamvar, T. H. Haveliwala, C. D. Manning, and G. H. Golub, “Exploiting the Block Structure of the Web for Computing Page-Rank,” Technical Report 2003-17, Stanford University, 2003.
-  A. Arasu, J. Novak, A. Tomkins, and J. Tomlin, “PageRank Computation and the Structure of the Web: Experiments and Algorithms,” Proc. of the 11th Inter. World Wide Web Conf., Poster Track, ACM Press, 2002.
-  C. P. C. Lee, G. H. Golub, and S. A. Zenios, “A Fast Two-Stage Algorithm for Computing PageRank and Its Extensions,” Technical Report SCCM-2003-15, Stanford University, 2003.
-  A. N. Langville, C.D. Meyer, “A Reordering for the PageRank Problem,” SIAM Journal on Scientific Computing Vol.27 No.6, pp. 2112-2120, 2006.
-  J. M. Kleinberg, R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins, “The Web as a Graph: Measurements, Models, and Methods,” COCOON, pp. 1-17, 1999.
-  R. Albert, H. Jeong, and A. L. Barabasi, “Diameter of the Word-Wide Web,” Nature Vol.401, pp. 130, 1999.
-  A. Broder, R. Kumar, F. Maghout, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener, “Graph Structure in the Web,” Computer Networks Vol.33, pp. 309-320, 2000.
-  T. H. Haveliwala, “Efficient Computation of PageRank,” Technical Report 1999-31, Computer Science Department, Stanford University, 1999.
-  C. Ding, H. Zha, X. He, P. Husbands, and H. Simon, “Link Analysis: Hub and Authorities on the World Wide Web. SIAM Review,” Vol.46 No.2, pp. 256-268, 2004.
-  C. Ding, X. He, H. Zha, and H. Simon, “PageRank, HITS and a unified framework for the link analysis,” Proc. of the 25th ACM SIGIR Conf., pp. 353-354, 2002.
-  A. N. Langville and C. D. Meyer, “Deeper Inside Pagerank,” Internet Mathematic J., Vol.1, No.3, pp. 335-380, 2005.
-  A. Farahat, T. Lofaro, J. C. Miller, G. Rae, and L. A. Ward, “Authority Rankings from HITS, PageRank, and SALSA: Existence, Uniqueness, and Effect of Initialization,” SIAM Journal of Scientific Computing, Vol.27, No.4, pp. 1181-1201, 2006.
-  A. Y. Ng, A. X. Zheng, and M. I. Jordan, “Link Analysis, Eigenvectors, and Stability,” The Seventh Int. Joint Conf. on Artificial Intelligence, 2001.
-  A. Y. Ng, A. X. Zheng, and M. I. Jordan, “Stable Algorithms for Link Analysis,” Proc. of the 24th Annual Int. ACM SIGIR Conf., 2001.
-  R. Fagin, A. R. Karlin, J. Kleinberg, P. Raghavan, S. Rajagopalan, R. Rubinfeld, M. Sudan, and A. Tomkins, “Random Walks with Back Buttons,” Proc. of the 32nd ACM Symposium on Theory of Computing, 2001.
-  F. Mathieu and M. Bouklit, “The Effect of the Back Button in a Random Walk: Application for PageRank,” Proc. of the 13th Int. World Wide Web Conf., pp. 370-371, 2004.
-  M. Sydow, “Random Surfer with Back Step,” Proc. of the 13th Int. World Wide Web Conf., pp. 352-353, 2004.
-  N. Eiron, K. S. McCurley, and J. A. Tomlin, “Ranking the Web Frontier,” 13th Int. Word Wide Web Conf., ACM Press, 2004.
-  S. Chakrabarti, B. E. Dom, S. R. Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins, D. Gibson, and J. M. Kleinberg, “Mining the Link Structure of the World Wide Web,” IEEE Computer Magazine, Vol.32, pp. 60-67, 1999.
-  A. Mirzal, “PyThinSearch: A Simple Web Search Engine,” Proc. Int. Conf. on Complex, Intelligent and Software Intensive Systems, pp. 1-8, IEEE Comp. Soc., 2009.
-  T. Segaran, http://kiwitobes.com/db/searchindex.db.
This article is published under a Creative Commons Attribution-NoDerivatives 4.0 International License.