Visualization of the Internet News Based on Efficient Self-Organizing Map Using Restricted Region Search and Dimensionality Reduction
Tetsuya Toyota*,** and Hajime Nobuhara*
*Department of Intelligent Interaction Technologies, University of Tsukuba, 1-1-1 Tenoudai, Tsukuba Science City, Ibaraki 305-8573, Japan
**Japan Society for the Promotion of Science, Sumitomo Ichibancho FS Bldg., 8 Ichibancho, Chiyoda-ku, Tokyo 102-8472, Japan
In this paper, we propose a system to visualize the relationships in huge quantities of Internet news by twodimensional self-organizing maps instead of the conventional methods of listing Internet news. In the proposed method, morphological analysis is conducted on the texts of Internet news to generate input vectors with elements of keywords. The characteristics specific to Internet news that many of the vector elements become sparse allows dimensional reductions as well as speeding up of self-organizing mapping with restricted search regions in learning. We verify through evaluation experiments with the data of 80 pieces of news that the proposed system can reduce computation time by 75% to 99% and can create more efficient SOM compared with the generally available SOM.
-  M. W. Berry and J. Kogan, “Text Mining: Applications and Theory,” Wiley, 2010.
-  T. Hashimoto, K. Murakami, K. Inui, K. Utsumi, and M. Ishikawa, “Topic Extraction and Social Problem Detection Based on Document Clustering,” Sociotechnica, Vol.5, pp. 216-226, 2008.
-  T. Iwata, T. Yamada, and N. Ueda, “Probabilistic Latent Semantic Visualization: Topic Model for Visualizing Documents,” Proc. of 14th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD2008), pp. 363-371, 2008.
-  S. Roweis and L. Saul, “Nonlinear dimensionality reduction by locally linear embedding,” Science, Vol.290, No.5500, pp. 2323-2326, 2000.
-  M. Trampus andM. Grobelnik, “Visualization of Online Discussion Forums,” Workshop on Applications of Pattern Analysis, Windsor, UK, Vol.11, pp. 134-141, 2010.
-  T. Kohonen, “Self-Organizing Maps,” Springer, 1995.
-  T. Joachims, “Text categorization with Support Vector Machines: Learning with many relevant features,” ECML-98, Lecture Notes in Computer Science, Vol.1398. pp. 137-142, 1998.
-  A. McCallum and K. Nigam, “A comparison of event models for naive Bayes text classification,” IN AAAI-98 Workshop on Learning for Text Categorization, 1998.
-  K. Aihara and A. Takasu, “Domain Visualization Based on Authorized,” NII Journal, Vol.5, pp. 1-8, 2003.
-  R. Sano, K. Hatano, and K. Tanaka, “Clustering and Visualizing of Web Documents using Self-Organizing Map,” IPSJ SIG Technical Report SIG-DBS, Vol.98, No.57, pp. 33-40, 1998.
-  L. Wang, M. Jiang, S. Liao, and Y. Lu, “A Feature Selection Method Based on Concept Extraction and SOM Text Clustering Analysis,” IJCSNS Int. J. of Computer Science and Network Security, pp. 20-28, 2006.
-  T. Honkela, S. Kaski, K. Lagus, and T. Kohonen, “WEBSOM – Self-Organizing Maps of Document Collections,” Proc. of the Workshop on Self-Organizing Maps, pp. 310-315, 1997.
-  D. Roussinov and H. Chen, “A Scalable Self-Organizing Map Algorithm for Textual Classification: A Neural Network Approach to Thesaurus Generation,” CC-AI Communication, Cognition and Artificial Intelligence, Vol.15, No.1-2, pp. 81-111, 1998.
-  R. Rizzo, M. Allegra, and G. Fulantelli, “Hypertext-like structures through a SOM network,” in Proc. of the 10th ACM Conf. on Hypertext and Hypermedia, pp. 71-72, 1998.
-  T. Kohonen, “The speedy SOM,” Technical Report A33, Helsinki University of Technology, Laboratory of Computer and Information Science, 2008.
-  J. Zhang, “Dynamics and Formation of Self-Organizing Maps,” Neural Computation, Vol.3, No.1, pp. 54-66, 1991.
-  N. R. Pal, J. C. Bezdek, and E. C. K. Tsao, “Improving convergence and performance of Kohonen���s self-organizing scheme,” Proc. SPIE 1710, pp. 500-509, 1992.
-  T. Tokunaga, “Information Retrieval and Language Processing,” University of Tokyo Press, 1999.
-  “MeCab.” http://mecab.sourceforge.net
-  T. Yanagida, T. Miura, and I. Sioya, “Classifying Databases by kpropagated Self-Organizing Map,” Int. Conf. on Enterprise Information Systems (ICEIS), pp. 499-502, 2003.
-  T. Ichimura, S. Oeda, T. Yamashita, and E. Tazaki, “A Learning Method of Neural Network with Lattice Architecture,” J. of Japan Society for Fuzzy Theory and Systems, Vol.14, No.1, pp. 28-42, 2002.
-  “Processing.” http://processing.org.
-  R. Feldman and J. Sanger, “The Text Mining Handbook,” Cambridge University Press, 2007.