Dual Scaling in Data Mining from Text Databases

Junzo Watada; Keisuke Aoki; Masahiro Kawano; Muhammad Suzuri Hitam

doi:10.20965/jaciii.2006.p0451

single-jc.php

« previous

JACIII Vol.10 No.4 pp. 451-457

(2006)

doi: 10.20965/jaciii.2006.p0451

Paper:

Views over last 60 days: 859

Dual Scaling in Data Mining from Text Databases

Junzo Watada^*, Keisuke Aoki^**, Masahiro Kawano^*,
and Muhammad Suzuri Hitam^***

^*Graduate School of Information, Production and Systems, Waseda University, 2-2 Hibikino, Wakamatsu-ku, Kitakyushu, Fukuoka 808-0135, Japan

^**AlaxaIA, New Kawasaki Mitsui BLDG West 13 F, 890 Kashimata, Saiwai, Kawasaki, Kanagawa 212-0058, Japan

^***University College of Science and Technology Malaysia, 21030 Mengabang Telipot, Kuala Terengganu, Malaysia

Received:

June 15, 2005

Accepted:

October 18, 2005

Published:

July 20, 2006

Keywords:

text mining, dual scaling, fuzzy quantification analysis, library data

Abstract

The availability of multimedia text document information has disseminated text mining among researchers. Text documents, integrate numerical and linguistic data, making text mining interesting and challenging. We propose text mining based on a fuzzy quantification model and fuzzy thesaurus. In text mining, we focus on: 1) Sentences included in Japanese text that are broken down into words. 2) Fuzzy thesaurus for finding words matching keywords in text. 3) Fuzzy multivariate analysis to analyze semantic meaning in predefined case studies. We use a fuzzy thesaurus to translate words using Chinese and Japanese characters into keywords. This speeds up processing without requiring a dictionary to separate words. Fuzzy multivariate analysis is used to analyze such processed data and to extract latent mutual related structures in text data, i.e., to extract otherwise obscured knowledge. We apply dual scaling to mining library and Web page text information, and propose integrating the result in Kansei engineering for possible application in sales, marketing, and production.

Cite this article as:

J. Watada, K. Aoki, M. Kawano, and M. Hitam, “Dual Scaling in Data Mining from Text Databases,” J. Adv. Comput. Intell. Intell. Inform., Vol.10 No.4, pp. 451-457, 2006.

Data files:

References

[1] K. Aoki, J. Watada, and T. Yabuuchi, “Data Mining from Text Data Base,” Proceedings, Kyushu and Yamaguchi Branch, Bio-Medical Fuzzy System Association, August 23, 2003 (in Japanese).
[2] C. Apte, “Text Mining Applications for Electronic Help Desk,” Proc. of the 4th Int. Conf. and Exhibition on the Practical Application of Knowledge Discovery and Data Mining, 2000, pp. 19-25.
[3] O. De Vel, “Mining e-mail authorship,” Proc. of KDD 2000 Workshop on Text Mining, 2000.
[4] R. Feldman, and I. Dagan, “Knowledge discovery in textual databases,” Proc. of the First International Conference on Knowledge Discovery and Data Mining, 1995, pp. 112-117.
[5] R. Feldman, I. Dagan, and H. Hirsh, “Mining text using keyword distributions,” J. Intell. Inf. Syst. 10, pp. 281-300, 1998.
[6] T. Fujimoto, and M. Sugeno, “Construction of verb thesaurus that uses fuzzy adjacent function,” Official Journal of Japan Society of Fuzzy Theory and Systems, Vol.11, No.3, pp. 462-471, 1999 (in Japanese).
[7] C. Hayashi, “On the Quantification of Qualitative Data from the Mathematico-Statistical Point of View,” Annals of the Institute of Stat. Math., II, 1, 1950.
See also C. Hayashi, “On the Prediction of Phenomena from Qualitative Data and the Quantification of Qualitative Data from the Mathematico-Statistical Point of View,” Annals of the Institute of Stat. Math., 3, 1952.
[8] C. Hayashi, “Quantification Method,” Toyo Keizai Shinpo-sha, 1972 (in Japanese).
See also C. Hayashi, I. Higuchi, and T. Komazawa, “Statistic in Information Processing,” Sangyo Tosho, 1970 (in Japanese).
[9] Y. Ichimura, Y. Nakayama, M. Miyoshi, T. Akahane, T. Sekiguchi, and Y. Fujiwara, “Text mining system for analysis of a Salesperson’s daily reports,” Proc. of the Pacific Association for Computational Linguistics 2001, pp. 127-135, 2001.
[10] M. Isomoto, H. Nozaki, K. Yoshine, S. Hasegawa, and N. Ishi, “Quantitative verification related to synonym of fuzzy in impression word thesaurus,” Official Journal of Japan Society of Fuzzy Theory and Systems, Vol.8, No.4, pp. 646-656, 1996 (in Japanese).
[11] M. Kataoka, T. Imanaka, K. Mizutani, and N. Wakami, “Key word extraction and related information system intended for text information,” Official Journal of Japan Society of Fuzzy Theory and Systems, Vol.9, No.5, pp. 710-717, 1997 (in Japanese).
[12] C. Kawaguchi, “Introduction to Multivariate Analysis,” Vol.1 and 2, Morikita Publishing, 1998 (in Japanese).
[13] S. Mochida, “Dynamic knowledge set generation system that used Web technology,” Proceedings of Kyushu and Yamaguchi Branch, Bio-Medical Fuzzy Systems Association, August 23, 2003 (in Japanese).
[14] M. Nagamachi, “Basic Study of Kansei Engineering and Application,” Kaibundou publication Ltd., 1993 (in Japanese).
[15] Y. Nakamori, “Fuzzy Quantification Analysis for Kansei Data Analysis and Sensibility Information Processing,” Morikita Publishing, 2000 (in Japanese).
[16] P. N. Tan, H. Blau, S. Harp, and R. Goldman, “Textual data mining of service center call records,” Proc. of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2004, pp. 417-423.
[17] J. Watada, “Fuzzy Quantification Theory,” Chapter 6, In T. Terano, K. Asai, and M. Sugeno (Eds.), “Fuzzy Systems Theory and Its Applications,” pp. 101-123, Academic Press, 1992.
[18] L. A. Zadeh, “Fuzzy Sets,” Information and Control, 8, pp. 338-353, 1965.
[19] L. A. Zadeh, “Probability Measures of Fuzzy Events,” Journal of Mathematical Analysis and Applications, 23, pp. 421-427, 1968.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[B1] [1] K. Aoki, J. Watada, and T. Yabuuchi, “Data Mining from Text Data Base,” Proceedings, Kyushu and Yamaguchi Branch, Bio-Medical Fuzzy System Association, August 23, 2003 (in Japanese).

[B2] [2] C. Apte, “Text Mining Applications for Electronic Help Desk,” Proc. of the 4th Int. Conf. and Exhibition on the Practical Application of Knowledge Discovery and Data Mining, 2000, pp. 19-25.

[B3] [3] O. De Vel, “Mining e-mail authorship,” Proc. of KDD 2000 Workshop on Text Mining, 2000.

[B4] [4] R. Feldman, and I. Dagan, “Knowledge discovery in textual databases,” Proc. of the First International Conference on Knowledge Discovery and Data Mining, 1995, pp. 112-117.

[B5] [5] R. Feldman, I. Dagan, and H. Hirsh, “Mining text using keyword distributions,” J. Intell. Inf. Syst. 10, pp. 281-300, 1998.

[B6] [6] T. Fujimoto, and M. Sugeno, “Construction of verb thesaurus that uses fuzzy adjacent function,” Official Journal of Japan Society of Fuzzy Theory and Systems, Vol.11, No.3, pp. 462-471, 1999 (in Japanese).

[B7] [7] C. Hayashi, “On the Quantification of Qualitative Data from the Mathematico-Statistical Point of View,” Annals of the Institute of Stat. Math., II, 1, 1950.
See also C. Hayashi, “On the Prediction of Phenomena from Qualitative Data and the Quantification of Qualitative Data from the Mathematico-Statistical Point of View,” Annals of the Institute of Stat. Math., 3, 1952.

[B8] [8] C. Hayashi, “Quantification Method,” Toyo Keizai Shinpo-sha, 1972 (in Japanese).
See also C. Hayashi, I. Higuchi, and T. Komazawa, “Statistic in Information Processing,” Sangyo Tosho, 1970 (in Japanese).

[B9] [9] Y. Ichimura, Y. Nakayama, M. Miyoshi, T. Akahane, T. Sekiguchi, and Y. Fujiwara, “Text mining system for analysis of a Salesperson’s daily reports,” Proc. of the Pacific Association for Computational Linguistics 2001, pp. 127-135, 2001.

[B10] [10] M. Isomoto, H. Nozaki, K. Yoshine, S. Hasegawa, and N. Ishi, “Quantitative verification related to synonym of fuzzy in impression word thesaurus,” Official Journal of Japan Society of Fuzzy Theory and Systems, Vol.8, No.4, pp. 646-656, 1996 (in Japanese).

[B11] [11] M. Kataoka, T. Imanaka, K. Mizutani, and N. Wakami, “Key word extraction and related information system intended for text information,” Official Journal of Japan Society of Fuzzy Theory and Systems, Vol.9, No.5, pp. 710-717, 1997 (in Japanese).

[B12] [12] C. Kawaguchi, “Introduction to Multivariate Analysis,” Vol.1 and 2, Morikita Publishing, 1998 (in Japanese).

[B13] [13] S. Mochida, “Dynamic knowledge set generation system that used Web technology,” Proceedings of Kyushu and Yamaguchi Branch, Bio-Medical Fuzzy Systems Association, August 23, 2003 (in Japanese).

[B14] [14] M. Nagamachi, “Basic Study of Kansei Engineering and Application,” Kaibundou publication Ltd., 1993 (in Japanese).

[B15] [15] Y. Nakamori, “Fuzzy Quantification Analysis for Kansei Data Analysis and Sensibility Information Processing,” Morikita Publishing, 2000 (in Japanese).

[B16] [16] P. N. Tan, H. Blau, S. Harp, and R. Goldman, “Textual data mining of service center call records,” Proc. of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2004, pp. 417-423.

[B17] [17] J. Watada, “Fuzzy Quantification Theory,” Chapter 6, In T. Terano, K. Asai, and M. Sugeno (Eds.), “Fuzzy Systems Theory and Its Applications,” pp. 101-123, Academic Press, 1992.

[B18] [18] L. A. Zadeh, “Fuzzy Sets,” Information and Control, 8, pp. 338-353, 1965.

[B19] [19] L. A. Zadeh, “Probability Measures of Fuzzy Events,” Journal of Mathematical Analysis and Applications, 23, pp. 421-427, 1968.

Dual Scaling in Data Mining from Text Databases

Junzo Watada*, Keisuke Aoki**, Masahiro Kawano*, and Muhammad Suzuri Hitam***

Junzo Watada^*, Keisuke Aoki^**, Masahiro Kawano^*,
and Muhammad Suzuri Hitam^***