Treemap-Based Cluster Visualization and its Application to Text Data Analysis
Yasufumi Takama, Yuna Tanaka, Yoshiyuki Mori, and Hiroki Shibata
Tokyo Metropolitan University
6-6 Asahigaoka, Hino, Tokyo 191-0065, Japan
This paper proposes Treemap-based visualization for supporting cluster analysis of multi-dimensional data. It is important to grasp data distribution in a target dataset for such tasks as machine learning and cluster analysis. When dealing with multi-dimensional data such as statistical data and document datasets, dimensionality reduction algorithms are usually applied to project original data to lower-dimensional space. However, dimensionality reduction tends to lose the characteristics of data in the original space. In particular, the border between different data groups could not be represented correctly in lower-dimensional space. To overcome this problem, the proposed visualization method applies Fuzzy c-Means to target data and visualizes the result on the basis of the highest and the second-highest membership values with Treemap. Visualizing the information about not only the closest clusters but also the second closest ones is expected to be useful for identifying objects around the border between different clusters, as well as for understanding the relationship between different clusters. A prototype interface is implemented, of which the effectiveness is investigated with a user experiment on a news articles dataset. As another kind of text data, a case study of applying it to a word embedding space is also shown.
-  D. Sacha, L. Zhang, M. Sedlmair, J. A Lee, J. Peltonen, D. Weiskopf, S. C. North, and D. A Keim, “Visual Interaction with Dimensionality Reduction: A Structured Literature Analysis,” IEEE Trans. Visualization and Computer Graphics, Vol.23, No.1, pp. 241-250, 2017.
-  J. C. Bezdek, “Pattern Recognition with Fuzzy Objective Function Algorithms,” Plenum Press, 1981.
-  Y. Takama, Y. Mori, and H. Shibata, “Generation of Word Vectors for Unknown Words without Additional Corpus,” The 2020 IEEE/WIC/ACM Int. Joint Conf. on Web Intelligence and Intelligent Agent Technology (WI-IAT2020), No.WI257, 2020.
-  Y. Takama, Y. Tanaka, and H. Shibata, “Proposal of Treemap-Based Cluster Visualization and its Application to News Article Data,” The 9th Int. Symp. on Computational Intelligence and Industrial Applications (ISCIIA2020), No.1A-1-2-3, 2020.
-  J. Bae, T. Helldin, M. Riveiro, S. Nowaczyk, M.-R. Bouguelia, and G. Falkman, “Interactive Clustering: A Comprehensive Review,” ACM Computing Surveys, Vol.53, No.1, Article No.1, 2020.
-  Y. Takama and T. Tonegawa, “Interactive Document Clustering System Based on Coordinated Multiple Views,” J. Adv. Comput. Intell. Intell. Inform., Vol.20, No.1, pp. 139-145, doi: 10.20965/jaciii.2016.p0139, 2016.
-  J. Choo, C. Lee, Chandan K. Reddy, and H. Park, “UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization,” IEEE Trans. Visualization and Computer Graphics, Vol.19, No.12, pp. 1992-2001, 2001.
-  B. Johnson and B. Shneiderman, “Tree-maps: a Space-filling Approach to the Visualization of Hierarchical Information Structures,” 2nd Int. IEEE Visualization Conf., pp. 284-291, 1991.
-  L. van der Maaten and G. Hinton, “Visualizing High-Dimensional Data Using t-SNE,” J. of Machine Learning Research, Vol.9, pp. 2579-2605, 2008.
-  X. Liu, Y. Hu, S. North, and H.-W. Shen, “CompactMap: A Mental Map Preserving Visual Interface for Streaming Text Data,” 2013 IEEE Int. Conf. on Big Data, 2013.
-  D.-Q. Zhang and S.-C. Chen, “Clustering Incomplete Data Using Kernel-based Fuzzy C-means Algorithm,” Neural Processing Letters, Vol.18, pp. 155-162, 2003.
-  K. Matsui, Y. Kageyama, and H. Yokoyama, “Analysis of Water Quality Conditions of Lake Hachiroko Using Fuzzy C-Means,” J. Adv. Comput. Intell. Intell. Inform., Vol.23, No.3, pp. 456-464, doi: 10.20965/jaciii.2019.p0456, 2019.
-  S. Akimoto, T. Takahashi, M. Suzuki, Y. Arai, and S. Aoyagi, “Human Detection by Fourier Descriptors and Fuzzy Color Histograms with Fuzzy c-Means Method,” J. Robot. Mechatron., Vol.28, No.4, pp. 491-499, doi: 10.20965/jrm.2016.p0491, 2016.
-  M. Zolkepli, F. Dong, and K. Hirota, “Visualizing Fuzzy Relationship in Bibliographic Big Data Using Hybrid Approach Combining Fuzzy c-Means and Newman-Girvan Algorithm,” J. Adv. Comput. Intell. Intell. Inform., Vol.18, No.6, pp. 896-907, doi: 10.20965/jaciii.2014.p0896, 2014.
-  E. Sherkat, S. Nourashrafeddin, Evangelos E. Milios, and R. Minghim, “Interactive Document Clustering Revisited: A Visual Analytics Approach,” 23rd Int. Conf. on Intelligent User Interfaces (IUI2018), pp. 281-292, 2018.
-  S. Nourashrafeddin, E. Sherkat, R. Minghim, and E. E. Milios, “A Visual Approach for Interactive Keyterm-Based Clustering,” ACM Trans. on Interactive Intelligent Systems, Vol.8, No.1, Article No.6, pp. 611-635, 2018.
-  A. Inselberg, “The Plane with Parallel Coordinates,” The Visual Computer, Vol.1, pp. 69-91, 1985.
-  K.-J. Lee, S.-T. Yun, S. Yu, K.-H. Kim, J.-H. Lee, and S.-H. Lee, “The combined use of self-organizing map technique and fuzzy c-means clustering to evaluate urban groundwater quality in Seoul metropolitan city, South Korea,” J. of Hydrology, Vol.569, pp. 685-697, 2019.
-  Y. Zhao, F. Luo, M. Chen, Y. Wang, J. Xia, F. Zhou, Y. Wang, Y. Chen, and W. Chen, “Evaluating Multi-Dimensional Visualizations for Understanding Fuzzy Clusters,” IEEE Trans. Visualization and Computer Graphics, Vol.25, No.1, pp. 12-21, 2019.
-  P. Hoffman, G. Grinstein, K. Marx, I. Grosse, and E. Stanley, “DNA visual and analytic data mining,” Proc. of the IEEE Visualization, pp. 437-441, 1997.
-  F. Zhou, M. Chen, Z. Wang, F. Luo, X. Luo, W. Huang, Y. Chen, and Y. Zhao, “A Radviz-based Visualization for Understanding Fuzzy Clustering Results,” Proc. of the 10th Int. Symp. on Visual Information Communication and Interaction (VINCI’17), pp. 9-15, 2017.
-  M. Bruls, K. Huizing, and J. J. van Wijk, “Squarified Treemap,” Proc. of the Joint EUROGRAPHICS and IEEE TCVG Symp. on Visualization, pp. 33-42, 2000.
-  B. B. Bederson, B. Shneiderman, and M. Wattenberg, “Ordered and Quantum Treemaps: Making Effective Use of 2D Space to Display Hierarchies,” ACM Trans. on Graphics, Vol.21, Issue 4, pp. 833-854, 2002.
-  M. Balzer and O. Deussen, “Voronoi Treemaps,” Proc. of 2005 IEEE Symp. on Information Visualization, pp. 7-14, 2005.
-  Q. V. Le and T. Mikolov, “Distributed Representations of Sentences and Documents,” Proc. of the 31st Int. Conf. on Machine Learning, pp. 1188-1196, 2014.
-  T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient Estimation of Word Representations in Vector Space,” ICLR Workshop 2013, 2013.