JACIII Vol.28 No.1 pp. 86-93
doi: 10.20965/jaciii.2024.p0086

Research Paper:

Dynamic Generation of Subordinate Clusters Based on Bayesian Information Criterion for Must-Link Constrained K-Means

Shota Shimizu, Shun Sakayauchi, Hiroki Shibata ORCID Icon, and Yasufumi Takama ORCID Icon

Graduate School of Systems Design, Tokyo Metropolitan University
6-6 Ashigaoka, Hino, Tokyo 191-0065, Japan

March 19, 2023
August 16, 2023
January 20, 2024
clustering, constrained clustering, Bayesian information criterion (BIC)

This paper proposes a constrained K-means clustering method that dynamically generates subordinate clusters based on Bayesian information criterion (BIC). COP K-means, which considers a pairwise constraints in partition-based clustering, have difficulty in handling the case that a must-link is given to instances located far away from each other. To address this problem, the proposed method generates subordinate clusters that have a must-link to a master cluster during a clustering process. The final clustering result is obtained by merging the subordinate clusters. The proposed method determines whether to generate subordinate clusters or not based on the BIC. This paper also introduces an idea of mitigating the sensitivity to initial position of subordinate clusters. The effectiveness of the proposed methods is shown through the experiment with two synthetic datasets.

Cite this article as:
S. Shimizu, S. Sakayauchi, H. Shibata, and Y. Takama, “Dynamic Generation of Subordinate Clusters Based on Bayesian Information Criterion for Must-Link Constrained K-Means,” J. Adv. Comput. Intell. Intell. Inform., Vol.28 No.1, pp. 86-93, 2024.
Data files:
  1. [1] G. Schwarz, “Estimating the Dimension of a Model,” The Annals of Statistics, Vol.6, No.2, pp. 461-464, 1978.
  2. [2] K. Wagstaff et al., “Constrained K-Means Clustering with Background Knowledge,” Proc. of the 18th Int. Conf. on Machine Learning (ICML’01), pp. 577-584, 2001.
  3. [3] M. Okabe and S. Yamada, “Constrained Clustering with Interactive Similarity Learning,” Proc. of the Joint 5th Int. Conf. on Soft Computing and Intelligent System and 11th Int. Symp. on Advanced Intelligent Systems (SCIS & ISIS 2010), pp. 1295-1300, 2010.
  4. [4] J. MacQueen, “Some Methods for Classification and Analysis of Multivariate Observations,” Proc. of the 5th Berkeley Symp. on Mathematical Statistics and Probability, Vol.1, pp. 281-297, 1967.
  5. [5] S. Shimizu et al., “Proposal of Dynamic Generation of Subordinate Clusters Based on Bayesian Information Criterion for Must-Link Constrained K-Means,” The 10th Int. Symp. on Computational Intelligence and Industrial Applications (ISCIIA2022), Session No.C2-3, 2022.
  6. [6] I. Davidson and S. S. Ravi, “Clustering with Constraints: Feasibility Issues and the k-Means Algorithm,” Proc. of the 2005 SIAM Int. Conf. on Data Mining (SDM), pp. 138-149, 2005.
  7. [7] D. Pelleg and A. W. Moore, “X-Means: Extending K-Means with Efficient Estimation of the Number of Clusters,” Proc. of the 17th Int. Conf. on Machine Learning (ICML’00), pp. 727-734, 2000.
  8. [8] K. P. Sinaga and M.-S. Yang, “Unsupervised K-Means Clustering Algorithm,” IEEE Access, Vol.8, pp. 80716-80727, 2020.
  9. [9] R. R. Sokal and C. D. Michener, “A Statistical Method for Evaluating Systematic Relationships,” University of Kansas Science Bulletin, Vol.38, Part 2, pp. 1409-1438, 1958.
  10. [10] J. H. Ward Jr., “Hierarchical Grouping to Optimize an Objective Function,” J. of the American Statistical Association, Vol.58, No.301, pp. 236-244, 1963.

*This site is desgined based on HTML5 and CSS3 for modern browsers, e.g. Chrome, Firefox, Safari, Edge, Opera.

Last updated on Feb. 19, 2024