Comparison of Semi-Supervised Hierarchical Clustering Using Clusterwise Tolerance
Yukihiro Hamasuna* and Yasunori Endo**
*Department of Informatics, School of Science and Engineering, Kinki University, 3-4-1 Kowakae, Higashi-Osaka, Osaka 577-8502, Japan
**Faculty of Engineering, Information and Systems, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki 305-8573, Japan
This paper presents a new semi-supervised agglomerative hierarchical clustering algorithm with the ward method using clusterwise tolerance. Semi-supervised clustering has recently been noted and studied in many research fields. Must-link and cannot-link, called pairwise constraints, are frequently used in order to improve clustering properties in semi-supervised clustering. First, clusterwise tolerance based pairwise constraints are introduced in order to handle mustlink and cannot-link constraints. Next, a new semisupervised hierarchical clustering algorithm with the ward method is constructed based on the above discussions. The effectiveness of the proposed algorithms is, moreover, verified through numerical examples.
-  O. Chapelle, B. Schoölkopf, and A. Zien (Eds.), “Semi-Supervised Learning,” MIT Press, 2006.
-  S. Miyamoto, H. Ichihashi, and K. Honda, “Algorithms for Fuzzy Clustering,” Springer, Heidelberg, 2008.
-  K. Wagstaff, C. Cardie, S. Rogers, and S. Schroedl, “Constrained k-means clustering with background knowledge,” Proc. of the 18th Int. Conf. on Machine Learning (ICML 2001), pp. 577-584, 2001.
-  S. Basu, A. Banerjee, and R. J. Mooney, “Active semi-supervision for pairwise constrained clustering,” Proc. of the SIAM Int. Conf. on Data Mining (SDM 2004), pp. 333-344, 2004.
-  S. Basu, M. Bilenko, and R. J. Mooney, “A probabilistic framework for semi-supervised clustering,” Proc. of the 10th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD 2004), pp. 59-68, 2004.
-  S. Miyamoto, M. Yamazaki, and A. Terami, “On semi-supervised clustering with pairwise constraints,” Proc. of The 7th Int. Conf. on Modeling Decisions for Artificial Intelligence (MDAI 2009), pp. 245-254, CD-ROM, 2009.
-  B. Kulis, S. Basu, I. Dhillon, and R. Mooney, “Semi-supervised graph clustering: a kernel approach,” Machine Learning, Vol.74, No.1, pp. 1-22, 2009.
-  L. Talavera and J. Béjar, “Integrating declarative knowledge in hierarchical clustering tasks,” Proc. of the Third Int. Symposium on Advances in Intelligent Data Analysis (IDA’99), pp. 211-222, 1999.
-  D. Klein, S. Kamvar, and C. Manning, “From instance-level constraints to space-level constraints: making the most of prior knowledge in data clustering,” Proc. of the 19th Int. Conf. on Machine Learning (ICML 2002), pp. 307-314, 2002.
-  I. Davidson and S. S. Ravi, “Agglomerative hierarchical clustering with constraints: theoretical and empirical results,” Proc. of 9th European Conf. on Principles and Practice of Knowledge Discovery in Databases (KDD 2005), pp. 59-70, 2005.
-  Y. Hamasuna, Y. Endo, and S. Miyamoto, “On Tolerant Fuzzy c-Means,” J. of Advanced Computational Intelligence and Intelligent Informatics, Vol.13, No.4, pp. 421-427, 2009.
-  Y. Hamasuna, Y. Endo, and S. Miyamoto, “Fuzzy c-Means Clustering for Data with Clusterwise Tolerance Based on L2- and L1-Regularization,” J. of Advanced Computational Intelligence and Intelligent Informatics, Vol.15, No.1, pp. 68-75, 2011.
-  Y. Endo, R. Murata, H. Haruyama, and S. Miyamoto, “Fuzzy c-Means for Data with Tolerance,” Proc. of Int. Symposium on Nonlinear Theory and Its Applications (Nolta’05), pp. 345-348, 2005.
-  Y. Hamasuna and Y. Endo, “Semi-supervised fuzzy c-means clustering for data with clusterwise tolerance with pairwise constraints,” Joint 5th Int. Conf. on Soft Computing and Intelligent Systems and 11th Int. Symposium on Advanced Intelligent Systems (SCIS & ISIS 2010), pp. 397-400, 2010.
-  Y. Hamasuna and Y. Endo, “On Semi-supervised Fuzzy c-Means Clustering with Clusterwise Tolerance by Opposite Criteria,” Proc. of 2011 IEEE Int. Conf. on Granular Computing (GrC2011), pp. 225-230, 2011.
-  Y. Hamasuna, Y. Endo, and S. Miyamoto, “Semi-supervised agglomerative hierarchical clustering using clusterwise tolerance based pairwise constraint,” The 7th Int. Conf. on Modeling Decisions for Artificial Intelligence (MDAI2010), Lecture Notes in Artificial Intelligence LNAI, Vol.6408, pp. 152-162, Springer-Verlag Berlin Heidelberg, 2010.
-  S. Miyamoto, “Introduction to Cluster Analysis: Theory and Applications of Fuzzy Clustering,” Morikita-Shuppan, Tokyo, 1999 (in Japanese).