Research Paper:
Dynamic Generation of Subordinate Clusters Based on Bayesian Information Criterion for Must-Link Constrained K-Means
Shota Shimizu, Shun Sakayauchi, Hiroki Shibata , and Yasufumi Takama
Graduate School of Systems Design, Tokyo Metropolitan University
6-6 Ashigaoka, Hino, Tokyo 191-0065, Japan
This paper proposes a constrained K-means clustering method that dynamically generates subordinate clusters based on Bayesian information criterion (BIC). COP K-means, which considers a pairwise constraints in partition-based clustering, have difficulty in handling the case that a must-link is given to instances located far away from each other. To address this problem, the proposed method generates subordinate clusters that have a must-link to a master cluster during a clustering process. The final clustering result is obtained by merging the subordinate clusters. The proposed method determines whether to generate subordinate clusters or not based on the BIC. This paper also introduces an idea of mitigating the sensitivity to initial position of subordinate clusters. The effectiveness of the proposed methods is shown through the experiment with two synthetic datasets.
- [1] G. Schwarz, “Estimating the Dimension of a Model,” The Annals of Statistics, Vol.6, No.2, pp. 461-464, 1978. https://doi.org/10.1214/aos/1176344136
- [2] K. Wagstaff et al., “Constrained K-Means Clustering with Background Knowledge,” Proc. of the 18th Int. Conf. on Machine Learning (ICML’01), pp. 577-584, 2001.
- [3] M. Okabe and S. Yamada, “Constrained Clustering with Interactive Similarity Learning,” Proc. of the Joint 5th Int. Conf. on Soft Computing and Intelligent System and 11th Int. Symp. on Advanced Intelligent Systems (SCIS & ISIS 2010), pp. 1295-1300, 2010. https://doi.org/10.14864/softscis.2010.0.1295.0
- [4] J. MacQueen, “Some Methods for Classification and Analysis of Multivariate Observations,” Proc. of the 5th Berkeley Symp. on Mathematical Statistics and Probability, Vol.1, pp. 281-297, 1967.
- [5] S. Shimizu et al., “Proposal of Dynamic Generation of Subordinate Clusters Based on Bayesian Information Criterion for Must-Link Constrained K-Means,” The 10th Int. Symp. on Computational Intelligence and Industrial Applications (ISCIIA2022), Session No.C2-3, 2022.
- [6] I. Davidson and S. S. Ravi, “Clustering with Constraints: Feasibility Issues and the k-Means Algorithm,” Proc. of the 2005 SIAM Int. Conf. on Data Mining (SDM), pp. 138-149, 2005. https://doi.org/10.1137/1.9781611972757.13
- [7] D. Pelleg and A. W. Moore, “X-Means: Extending K-Means with Efficient Estimation of the Number of Clusters,” Proc. of the 17th Int. Conf. on Machine Learning (ICML’00), pp. 727-734, 2000.
- [8] K. P. Sinaga and M.-S. Yang, “Unsupervised K-Means Clustering Algorithm,” IEEE Access, Vol.8, pp. 80716-80727, 2020. https://doi.org/10.1109/ACCESS.2020.2988796
- [9] R. R. Sokal and C. D. Michener, “A Statistical Method for Evaluating Systematic Relationships,” University of Kansas Science Bulletin, Vol.38, Part 2, pp. 1409-1438, 1958.
- [10] J. H. Ward Jr., “Hierarchical Grouping to Optimize an Objective Function,” J. of the American Statistical Association, Vol.58, No.301, pp. 236-244, 1963. https://doi.org/10.1080/01621459.1963.10500845
This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.