Even-Sized Clustering Based on Optimization and its Variants

Yasunori Endo; Yukihiro Hamasuna; Tsubasa Hirano; Naohiko Kinoshita

doi:10.20965/jaciii.2018.p0062

single-jc.php

« previous

JACIII Vol.22 No.1 pp. 62-69

doi: 10.20965/jaciii.2018.p0062

(2018)

Paper:

Views over last 60 days: 3,467

Even-Sized Clustering Based on Optimization and its Variants

Yasunori Endo^, Yukihiro Hamasuna^, Tsubasa Hirano^, and Naohiko Kinoshita^***

^*Faculty of Engineering, Information and Systems, University of Tsukuba
1-1-1 Tennodai, Tsukuba, Ibaraki 305-8573, Japan

^**Department of Informatics, Kindai University
3-4-1 Kowakae, Higashiosaka, Osaka 577-8502, Japan

^***Department of Risk Engineering, University of Tsukuba
1-1-1 Tennodai, Tsukuba, Ibaraki 305-8573, Japan

Received:

January 10, 2017

Accepted:

October 7, 2017

Published:

January 20, 2018

Keywords:

even-sized clustering, k-member clustering, optimization, linear programming

Abstract

A clustering method referred to as K-member clustering classifies a dataset into certain clusters, the size of which is more than a given constant K. Even-sized clustering, which classifies a dataset into even-sized clusters, is also considered along with K-member clustering. In our previous study, we proposed Even-sized Clustering Based on Optimization (ECBO) to output adequate results by formulating an even-sized clustering problem as linear programming. The simplex method is used to calculate the belongingness of each object to clusters in ECBO. In this study, ECBO is extended by introducing ideas that were introduced in K-means or fuzzy c-means to resolve problems of initial-value dependence, robustness against outliers, calculation costs, and nonlinear boundaries of clusters. We also reconsider the relation between the dataset size, the cluster number, and K in ECBO. Moreover, we verify the effectiveness of the variants of ECBO based on experimental results using synthetic datasets and a benchmark dataset.

Cite this article as:

Y. Endo, Y. Hamasuna, T. Hirano, and N. Kinoshita, “Even-Sized Clustering Based on Optimization and its Variants,” J. Adv. Comput. Intell. Intell. Inform., Vol.22 No.1, pp. 62-69, 2018.

Data files:

References

[1] V. Torra, “Fuzzy microaggregation for the transparency principle,” J. of Applied Logic, Vol.23, pp. 70-80, 2017.
[2] J.-W. Byun, A. Kamra, E. Bertino, and N. Li, “Efficient k-Anonymization Using Clustering Techniques,” Proc. of the 12th Int. Conf. on Database Systems for Advanced Applications (DASFAA 2007), pp. 188-200, 2007.
[3] J.-L. Lin and M.-C. Wei, “An efficient clustering method for k-anonymization,” Proc. of the 2008 Int. workshop on Privacy and anonymity in information society (PAIS ’08), pp. 46-50, 2008.
[4] X. He, H. H. Chen, Y. Chen, Y. Dong, P. Wang, and Z. Huang, “Clustering-Based k-Anonymity,” Proc. of the 16th Pacific-Asia Conf. on Advances in Knowledge Discovery and Data Mining (PAKDD 2012), pp. 405-417, 2012.
[5] Y. Ogata and Y. Endo, “A Note on the K-Member Clustering Problem,” The 29th Fuzzy System Symp. (FSS 2013), MB2-2, 2013 (in Japanese).
[6] S. Miyamoto, H. Ichihashi, and K. Honda, “Algorithms for Fuzzy Clustering,” Springer, Heidelberg, 2008.
[7] T. Hirano, Y. Endo, N. Kinoshita, and Y. Hamasuna, “On Even-sized Clustering Algorithm Based on Optimization,” Proc. of Joint 7rd Int. Conf. on Soft Computing and Intelligent Systems and 15th Int. Symp. on Advanced Intelligent Systems (SCIS & ISIS 2014), TP4-3-5-(3), #69, 2014.
[8] D. Arthur and S. Vassilvitskii, “k-Means++: the Advantages of Careful Seeding,” Proc. of the 18th Annual ACM-SIAM Symp. on Discrete Algorithms, Society for Industrial and Applied Mathematics Philadelphia, PA, USA. pp. 1027-1035, 2007.
[9] S. Miyamoto and Y. Augusta, “An Efficient Algorithm for l₁ Fuzzy c-Means and its Termination,” Control and Cybernetics, Vol.24, pp. 421-436, 1995.
[10] L. Kaufman and P. J. Rousseeuw, “Finding groups in data: An introduction to cluster analysis,” Wiley, 1990.
[11] H.-S. Park and C.-H. Jun, “Simple and Fast Algorithm for K-Medoids Clustering,” Expert Systems with Applications, Vol.36, Issue 2, pp. 3336-3341, 2009.
[12] M. Girolami, “Mercer Kernel-Based Clustering in Feature Space,” IEEE Trans. on Neural Networks, Vol.13, No.3, pp. 780-784, 2002.
[13] Y. Endo, S. Miyamoto, “Spherical k-Means++ Clustering,” The 12th Int. Conf. on Modeling Decisions for Artificial Intelligence (MDAI 2015), LNAI 9321, Springer, pp. 103-114, 2015.
[14] L. Hubert, P. Arabie, “Comparing Partitions,” J. of Classification, Vol.2, pp. 193-218, 1985.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[1] [1] V. Torra, “Fuzzy microaggregation for the transparency principle,” J. of Applied Logic, Vol.23, pp. 70-80, 2017.

[2] [2] J.-W. Byun, A. Kamra, E. Bertino, and N. Li, “Efficient k-Anonymization Using Clustering Techniques,” Proc. of the 12th Int. Conf. on Database Systems for Advanced Applications (DASFAA 2007), pp. 188-200, 2007.

[3] [3] J.-L. Lin and M.-C. Wei, “An efficient clustering method for k-anonymization,” Proc. of the 2008 Int. workshop on Privacy and anonymity in information society (PAIS ’08), pp. 46-50, 2008.

[4] [4] X. He, H. H. Chen, Y. Chen, Y. Dong, P. Wang, and Z. Huang, “Clustering-Based k-Anonymity,” Proc. of the 16th Pacific-Asia Conf. on Advances in Knowledge Discovery and Data Mining (PAKDD 2012), pp. 405-417, 2012.

[5] [5] Y. Ogata and Y. Endo, “A Note on the K-Member Clustering Problem,” The 29th Fuzzy System Symp. (FSS 2013), MB2-2, 2013 (in Japanese).

[6] [6] S. Miyamoto, H. Ichihashi, and K. Honda, “Algorithms for Fuzzy Clustering,” Springer, Heidelberg, 2008.

[7] [7] T. Hirano, Y. Endo, N. Kinoshita, and Y. Hamasuna, “On Even-sized Clustering Algorithm Based on Optimization,” Proc. of Joint 7rd Int. Conf. on Soft Computing and Intelligent Systems and 15th Int. Symp. on Advanced Intelligent Systems (SCIS & ISIS 2014), TP4-3-5-(3), #69, 2014.

[8] [8] D. Arthur and S. Vassilvitskii, “k-Means++: the Advantages of Careful Seeding,” Proc. of the 18th Annual ACM-SIAM Symp. on Discrete Algorithms, Society for Industrial and Applied Mathematics Philadelphia, PA, USA. pp. 1027-1035, 2007.

[9] [9] S. Miyamoto and Y. Augusta, “An Efficient Algorithm for l₁ Fuzzy c-Means and its Termination,” Control and Cybernetics, Vol.24, pp. 421-436, 1995.

[10] [10] L. Kaufman and P. J. Rousseeuw, “Finding groups in data: An introduction to cluster analysis,” Wiley, 1990.

[11] [11] H.-S. Park and C.-H. Jun, “Simple and Fast Algorithm for K-Medoids Clustering,” Expert Systems with Applications, Vol.36, Issue 2, pp. 3336-3341, 2009.

[12] [12] M. Girolami, “Mercer Kernel-Based Clustering in Feature Space,” IEEE Trans. on Neural Networks, Vol.13, No.3, pp. 780-784, 2002.

[13] [13] Y. Endo, S. Miyamoto, “Spherical k-Means++ Clustering,” The 12th Int. Conf. on Modeling Decisions for Artificial Intelligence (MDAI 2015), LNAI 9321, Springer, pp. 103-114, 2015.

[14] [14] L. Hubert, P. Arabie, “Comparing Partitions,” J. of Classification, Vol.2, pp. 193-218, 1985.

Even-Sized Clustering Based on Optimization and its Variants

Yasunori Endo*, Yukihiro Hamasuna**, Tsubasa Hirano***, and Naohiko Kinoshita***

Yasunori Endo^, Yukihiro Hamasuna^, Tsubasa Hirano^, and Naohiko Kinoshita^***