Two-Stage Clustering Based on Cluster Validity Measures

Yukihiro Hamasuna; Ryo Ozaki; Yasunori Endo

doi:10.20965/jaciii.2018.p0054

single-jc.php

« previous

JACIII Vol.22 No.1 pp. 54-61

(2018)

doi: 10.20965/jaciii.2018.p0054

Paper:

Views over last 60 days: 7,181

Two-Stage Clustering Based on Cluster Validity Measures

Yukihiro Hamasuna^, Ryo Ozaki^, and Yasunori Endo^

^*Department of Informatics, School of Science and Engineering, Kindai University
3-4-1 Kowakae, Higashiosaka, Osaka 577-8502, Japan

^**Graduate School of Science and Engineering, Kindai University
3-4-1 Kowakae, Higashiosaka, Osaka 577-8502, Japan

^***Faculty of Engineering, Information and Systems, University of Tsukuba,
1-1-1 Tennodai, Tsukuba, Ibaraki 305-8573, Japan

Received:

November 25, 2016

Accepted:

October 7, 2017

Published:

January 20, 2018

Keywords:

two-stage clustering, cluster validity measures, kernel method, c-means clustering, agglomerative hierarchical clustering

Abstract

To handle a large-scale object, a two-stage clustering method has been previously proposed. The method generates a large number of clusters during the first stage and merges clusters during the second stage. In this paper, a novel two-stage clustering method is proposed by introducing cluster validity measures as the merging criterion during the second stage. The significant cluster validity measures used to evaluate cluster partitions and determine the suitable number of clusters act as the criteria for merging clusters. The performance of the proposed method based on six typical indices is compared with eight artificial datasets. These experiments show that a trace of the fuzzy covariance matrix W_tr and its kernelization KW_tr are quite effective when applying the proposed method, and obtain better results than the other indices.

Cite this article as:

Y. Hamasuna, R. Ozaki, and Y. Endo, “Two-Stage Clustering Based on Cluster Validity Measures,” J. Adv. Comput. Intell. Intell. Inform., Vol.22 No.1, pp. 54-61, 2018.

Data files:

References

[1] A. K. Jain, “Data clustering: 50 years beyond K-means,” Pattern Recognition Letters, Vol.31, No.8, pp. 651-666, 2010.
[2] J. C. Bezdek, “Pattern Recognition with Fuzzy Objective Function Algorithms,” Plenum Press, New York, 1981.
[3] S. Miyamoto, H. Ichihashi, and K. Honda, “Algorithms for Fuzzy Clustering,” Springer, Heidelberg, 2008.
[4] S. Miyamoto, “Introduction to Cluster Analysis,” Morikita-shuppan, 1999 (in Japanese).
[5] G. Hamerly, “Making k-means even faster,” Proc. of the 2010 SIAM Int. Conf. on Data Mining (SDM), pp. 130-140, 2010.
[6] T. C. Havens, J. C. Bezdek, C. Leckie, L. O. Hall, and M. Palaniswami, “Fuzzy c-means algorithms for very large data,” IEEE Trans. on Fuzzy Systems, Vol.20, No.6, pp. 1130-1146, 2012.
[7] N. Obara and S. Miyamoto, “A method of two-stage clustering with constraints using agglomerative hierarchical algorithm and one-pass k-means,” Proc. of The 6th Int. Conf. on Soft Computing and Intelligent Systems The 13th Int. Symposium on Advanced Intelligent Systems (SCIS&ISIS2012), pp. 1540-1544, 2012.
[8] Y. Tamura and S. Miyamoto, “Two-stage clustering using one-pass k-medoids and medoid-based agglomerative hierarchical algorithms,” Proc. of The 7th Int. Conf. on Soft Computing and Intelligent Systems The 15th Int. Symposium on Advanced Intelligent Systems (SCIS&ISIS2014), pp. 484-488, 2014.
[9] Y. Tamura and S. Miyamoto, “A method of two stage clustering using agglomerative hierarchical algorithms with one-pass k-means++ or k-median++,” Proc. of The 2014 IEEE Int. Conf. on Granular Computing (GrC2014), pp. 281-285, 2014.
[10] D. L. Davies and D. W. Bouldin, “A cluster separation measure,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.1, No.2, pp 224-227, 1979.
[11] I. Gath and A. B. Geva, “Unsupervised optimal fuzzy clustering,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.11, No.7, pp. 773-780, 1989.
[12] X. L. Xie and G. Beni, “A validity measure for fuzzy clustering,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.13, No.8, pp. 841-847, 1991.
[13] W. Wang and Y. Zhang, “On fuzzy cluster validity indices,” Fuzzy Sets and Systems, Vol.158, No.19, pp. 2095-2117, 2007.
[14] R. Ozaki, Y. Hamasuna, and Y. Endo, “A method of two-stage clustering based on cluster validity measures,” Proc. of The 8th Int. Conf. on Soft Computing and Intelligent Systems The 17th Int. Symposium on Advanced Intelligent Systems (SCIS&ISIS2016), pp. 410-415, 2016.
[15] W. Hashimoto, T. Nakamura, and S. Miyamoto, “Comparison and evaluation of different cluster validity measures including their kernelization,” J. Adv. Comput. Intell. Intell. Inform., Vol.13, No.3, pp. 204-209, 2009.
[16] Y. Fukuyama and M. Sugeno, “A new method of choosing the number of clusters for fuzzy c-means method,” Proc. of 5th Fuzzy system Symposium, , pp. 247-250, 1989.
[17] M. Girolami, “Mercer kernel-based clustering in feature space,” IEEE Trans. Neural networks, Vol.13, No.3, pp. 780-784, 2002.
[18] Y. Endo, H. Haruyama, and T. Okubo, “On some hierarchical clustering algorithms using kernel functions, ” Proc. of IEEE Int. Conf. on Fuzzy Systems(FUZZ-IEEE2004), pp. 1513-1518, 2004.
[19] L. Hubert and P. Arabie, “Comparing Partitions,” J. of Classification, Vol.2, No.1, pp. 193-218, 1985.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[B1] [1] A. K. Jain, “Data clustering: 50 years beyond K-means,” Pattern Recognition Letters, Vol.31, No.8, pp. 651-666, 2010.

[B2] [2] J. C. Bezdek, “Pattern Recognition with Fuzzy Objective Function Algorithms,” Plenum Press, New York, 1981.

[B3] [3] S. Miyamoto, H. Ichihashi, and K. Honda, “Algorithms for Fuzzy Clustering,” Springer, Heidelberg, 2008.

[B4] [4] S. Miyamoto, “Introduction to Cluster Analysis,” Morikita-shuppan, 1999 (in Japanese).

[B5] [5] G. Hamerly, “Making k-means even faster,” Proc. of the 2010 SIAM Int. Conf. on Data Mining (SDM), pp. 130-140, 2010.

[B6] [6] T. C. Havens, J. C. Bezdek, C. Leckie, L. O. Hall, and M. Palaniswami, “Fuzzy c-means algorithms for very large data,” IEEE Trans. on Fuzzy Systems, Vol.20, No.6, pp. 1130-1146, 2012.

[B7] [7] N. Obara and S. Miyamoto, “A method of two-stage clustering with constraints using agglomerative hierarchical algorithm and one-pass k-means,” Proc. of The 6th Int. Conf. on Soft Computing and Intelligent Systems The 13th Int. Symposium on Advanced Intelligent Systems (SCIS&ISIS2012), pp. 1540-1544, 2012.

[B8] [8] Y. Tamura and S. Miyamoto, “Two-stage clustering using one-pass k-medoids and medoid-based agglomerative hierarchical algorithms,” Proc. of The 7th Int. Conf. on Soft Computing and Intelligent Systems The 15th Int. Symposium on Advanced Intelligent Systems (SCIS&ISIS2014), pp. 484-488, 2014.

[B9] [9] Y. Tamura and S. Miyamoto, “A method of two stage clustering using agglomerative hierarchical algorithms with one-pass k-means++ or k-median++,” Proc. of The 2014 IEEE Int. Conf. on Granular Computing (GrC2014), pp. 281-285, 2014.

[B10] [10] D. L. Davies and D. W. Bouldin, “A cluster separation measure,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.1, No.2, pp 224-227, 1979.

[B11] [11] I. Gath and A. B. Geva, “Unsupervised optimal fuzzy clustering,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.11, No.7, pp. 773-780, 1989.

[B12] [12] X. L. Xie and G. Beni, “A validity measure for fuzzy clustering,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.13, No.8, pp. 841-847, 1991.

[B13] [13] W. Wang and Y. Zhang, “On fuzzy cluster validity indices,” Fuzzy Sets and Systems, Vol.158, No.19, pp. 2095-2117, 2007.

[B14] [14] R. Ozaki, Y. Hamasuna, and Y. Endo, “A method of two-stage clustering based on cluster validity measures,” Proc. of The 8th Int. Conf. on Soft Computing and Intelligent Systems The 17th Int. Symposium on Advanced Intelligent Systems (SCIS&ISIS2016), pp. 410-415, 2016.

[B15] [15] W. Hashimoto, T. Nakamura, and S. Miyamoto, “Comparison and evaluation of different cluster validity measures including their kernelization,” J. Adv. Comput. Intell. Intell. Inform., Vol.13, No.3, pp. 204-209, 2009.

[B16] [16] Y. Fukuyama and M. Sugeno, “A new method of choosing the number of clusters for fuzzy c-means method,” Proc. of 5th Fuzzy system Symposium, , pp. 247-250, 1989.

[B17] [17] M. Girolami, “Mercer kernel-based clustering in feature space,” IEEE Trans. Neural networks, Vol.13, No.3, pp. 780-784, 2002.

[B18] [18] Y. Endo, H. Haruyama, and T. Okubo, “On some hierarchical clustering algorithms using kernel functions, ” Proc. of IEEE Int. Conf. on Fuzzy Systems(FUZZ-IEEE2004), pp. 1513-1518, 2004.

[B19] [19] L. Hubert and P. Arabie, “Comparing Partitions,” J. of Classification, Vol.2, No.1, pp. 193-218, 1985.

Two-Stage Clustering Based on Cluster Validity Measures

Yukihiro Hamasuna*, Ryo Ozaki**, and Yasunori Endo***

Yukihiro Hamasuna^, Ryo Ozaki^, and Yasunori Endo^