single-jc.php

JACIII Vol.16 No.2 pp. 247-255
doi: 10.20965/jaciii.2012.p0247
(2012)

Paper:

Knowledge Expansion Support by Related Search Keyword Generation Based onWikipedia Category and Pointwise Mutual Information

Saori Kawauchi*, Tetsuya Toyota*,**,
and Hajime Nobuhara*

*Department of Intelligent Interaction Technologies, University of Tsukuba, 1-1-1 Tenoudai, Tsukuba Science City, Ibaraki 305-8573, Japan

**Japan Society for the Promotion of Science, Sumitomo Ichibancho FS Bldg., 8 Ichibancho, Chiyoda-ku, Tokyo 102-8472, Japan

Received:
August 14, 2011
Accepted:
October 28, 2011
Published:
March 20, 2012
Keywords:
information retrieval, keyword extraction, pointwise mutual information, collective intelligence
Abstract

When users use search engines to acquire knowledge on certain subjects in unknown domains, they often refer to the related search keywords that are generated on the frequency of use as search keywords. However, such searches by reference to related search keywords may not always turn out to be useful for the expansion of knowledge on the research subjects. We, therefore, propose a new method to generate related search keywords by means of Wikipedia. In the proposed method, users first searchWikipedia pages of the same title with the queries input by users to extract information on the category of the pages. Next, obtain the sets of pages that fall into the category and extract related page groups from the pages contained in any plural product sets of pages. Then, calculate pointwise mutual information or tf-idf for the keywords extracted from each page to make either information of higher values associated with search keywords. We have confirmed effectiveness of the proposed method through comparison with related search keywords generated by Google as well as through subjective evaluation experiments.

Cite this article as:
Saori Kawauchi, Tetsuya Toyota, and
and Hajime Nobuhara, “Knowledge Expansion Support by Related Search Keyword Generation Based onWikipedia Category and Pointwise Mutual Information,” J. Adv. Comput. Intell. Intell. Inform., Vol.16, No.2, pp. 247-255, 2012.
Data files:
References
  1. [1] “Research about the present conditions and the market size of the Internet search engine.”
    http://www.soumu.go.jp/iicp/chousakenkyu/data/research/survey/telecom/2009/2009-I-14.pdf
  2. [2] “Questionnaire about the search engine use.”
    http://beta.keyword.jp.msn.com/bing/summary.htm
  3. [3] “Autocomplete: Explore Google Search – Web Search Help.”
    http://www.google.com/support/websearch/bin/answer.py?answer=106230
  4. [4] “Yahoo! Search Assist.”
    http://tools.search.yahoo.com/newsearch/searchassist.html
  5. [5] “Wikipedia.”
    http://ja.wikipedia.org/wiki/
  6. [6] K. Hori, T. Oishi, T. Mine, R. Hasegawa, H. Fujita, and M. Koshimura, “Web Retrieval with Extended Queries Generated from Wikipedia and Its Evaluation,” SIG-SWO-A803-13, 2009 (in Japanese).
  7. [7] K. Hori, T. Oishi, T. Mine, R.Hasegawa, H. Fujita, and M. Koshimura, “Relatedword Extraction from Wikipedia for Web Retrieval Assis-tance,” Int. Conf. on Agents and Artificial Intelligence (ICAART), pp. 192-199, 2010.
  8. [8] M. Kondo, T. Morita, A. Tanaka, and T. Uchiyama, “Personalized Query Recommendation Using HITS-Based Wikipedia Ranking Algorithm and User History,” IEICE DEWS2008, 2008 (in Japanese).
  9. [9] M. Shirakawa, K. Nakayama, T. Hara, and S. Nishio, “Concept Vector Extraction from Wikipedia Category Network,” in Proc. of 3rd Int. Conf. on Ubiquitous Information Management and Communication, 2009.
  10. [10] “MeCab.”
    http://mecab.sourceforge.net
  11. [11] H. Takamura, “Introduction to Machine Learning for Natural Language Processing,” Corona Publishing Co., Ltd., 2010.
  12. [12] T. Tokunaga, “Information Retrieval and Language Processing,” University of Tokyo Press, 1999.

*This site is desgined based on HTML5 and CSS3 for modern browsers, e.g. Chrome, Firefox, Safari, Edge, Opera.

Last updated on Sep. 24, 2021