Paper:
Knowledge Expansion Support by Related Search Keyword Generation Based onWikipedia Category and Pointwise Mutual Information
Saori Kawauchi*, Tetsuya Toyota*,**,
and Hajime Nobuhara*
*Department of Intelligent Interaction Technologies, University of Tsukuba, 1-1-1 Tenoudai, Tsukuba Science City, Ibaraki 305-8573, Japan
**Japan Society for the Promotion of Science, Sumitomo Ichibancho FS Bldg., 8 Ichibancho, Chiyoda-ku, Tokyo 102-8472, Japan
When users use search engines to acquire knowledge on certain subjects in unknown domains, they often refer to the related search keywords that are generated on the frequency of use as search keywords. However, such searches by reference to related search keywords may not always turn out to be useful for the expansion of knowledge on the research subjects. We, therefore, propose a new method to generate related search keywords by means of Wikipedia. In the proposed method, users first searchWikipedia pages of the same title with the queries input by users to extract information on the category of the pages. Next, obtain the sets of pages that fall into the category and extract related page groups from the pages contained in any plural product sets of pages. Then, calculate pointwise mutual information or tf-idf for the keywords extracted from each page to make either information of higher values associated with search keywords. We have confirmed effectiveness of the proposed method through comparison with related search keywords generated by Google as well as through subjective evaluation experiments.
- [1] “Research about the present conditions and the market size of the Internet search engine.”
http://www.soumu.go.jp/iicp/chousakenkyu/data/research/survey/telecom/2009/2009-I-14.pdf - [2] “Questionnaire about the search engine use.”
http://beta.keyword.jp.msn.com/bing/summary.htm - [3] “Autocomplete: Explore Google Search – Web Search Help.”
http://www.google.com/support/websearch/bin/answer.py?answer=106230 - [4] “Yahoo! Search Assist.”
http://tools.search.yahoo.com/newsearch/searchassist.html - [5] “Wikipedia.”
http://ja.wikipedia.org/wiki/ - [6] K. Hori, T. Oishi, T. Mine, R. Hasegawa, H. Fujita, and M. Koshimura, “Web Retrieval with Extended Queries Generated from Wikipedia and Its Evaluation,” SIG-SWO-A803-13, 2009 (in Japanese).
- [7] K. Hori, T. Oishi, T. Mine, R.Hasegawa, H. Fujita, and M. Koshimura, “Relatedword Extraction from Wikipedia for Web Retrieval Assis-tance,” Int. Conf. on Agents and Artificial Intelligence (ICAART), pp. 192-199, 2010.
- [8] M. Kondo, T. Morita, A. Tanaka, and T. Uchiyama, “Personalized Query Recommendation Using HITS-Based Wikipedia Ranking Algorithm and User History,” IEICE DEWS2008, 2008 (in Japanese).
- [9] M. Shirakawa, K. Nakayama, T. Hara, and S. Nishio, “Concept Vector Extraction from Wikipedia Category Network,” in Proc. of 3rd Int. Conf. on Ubiquitous Information Management and Communication, 2009.
- [10] “MeCab.”
http://mecab.sourceforge.net - [11] H. Takamura, “Introduction to Machine Learning for Natural Language Processing,” Corona Publishing Co., Ltd., 2010.
- [12] T. Tokunaga, “Information Retrieval and Language Processing,” University of Tokyo Press, 1999.
This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.