Comparison and Evaluation of Different Cluster Validity Measures Including Their Kernelization
Wataru Hashimoto, Tetsuya Nakamura, and Sadaaki Miyamoto
Department of Risk Engineering, School of Systems and Information Engineering, University of Tsukuba, Ibaraki 305-8573, Japan
Many different measures proposed for cluster validity remain to be compared using sufficient numbers of numerical examples. We compare the performance of five measures of the sum of determinants and the sum of traces of fuzzy covariances of clusters, the Xie-Beni index, the Davies-Bouldin index, and the Fukuyama-Sugeno index together with their kernelized versions, focusing on algorithms for calculating kernelized measures. We compared the effectiveness of these indices using thousands of automatically generated clusters. We found that no single measure outperforms the others, and that, contrary to the common understanding that determinants are better than traces, the sum of traces performs as well as the sum of determinants and, kernelized measures perform as well as nonkernelized ones.
-  J.C. Bezdek, “Pattern Recognition with Fuzzy Objective Function Algorithms,” Plenum, New York, 1981.
-  F. Höppner, F. Klawonn, R. Kruse, and T. Runkler, “Fuzzy Cluster Analysis,” Wiley, Chichester, 1999.
-  M. Girolami, “Mercer kernel based clustering in feature space,” IEEE Trans. on Neural Networks, Vol.13, No.3, pp. 780-784, 2002.
-  S. Miyamoto and D. Suizu, “Fuzzy c-means clustering using kernel functions in support vector machines,” Journal of Advanced Computational Intelligence and Intelligent Informatics, Vol.7, No.1, pp. 25-30, 2003.
-  C.J.C. Burges, “A tutorial on support vector machines for pattern recognition,” Data Mining and Knowledge Discovery, Vol.2, pp. 121-167, 1998.
-  V. N. Vapnik, “Statistical Learning Theory,” Wiley, New York, 1998.
-  J.C. Dunn, “A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters,” J. of Cybernetics, Vol.3, pp. 32-57, 1974.
-  D. Dumitrescu, B. Lazzerini, and L.C. Jain, “Fuzzy Sets and their Application to Clustering and Training,” CRC Press, Boca Ration, Florida, 2000.
-  I. Gath and A.B. Geva, “Unsupervised optimal fuzzy clustering,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.11, No.7, pp. 773-781, 1989.
-  X.L. Xie and G. Beni, “A Validity measure for Fuzzy Clustering,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.13, No.4, pp. 841-846, 1991.
-  Y. Fukuyama and M. Sugeno, “A new method for choosing the number of clusters for the fuzzy c-means method,” Proc. 5th Fuzzy System Symposium. pp. 247-250, July 1989.
-  D. L. Davies and D. W. Bouldin, “Cluster Separation Measure,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.1, No.2, pp. 95-104, 1979.
-  S. Miyamoto and Y. Nakayama, “Algorithms of hard c-means clustering using kernel functions in support vector machines,” Journal of Advanced Computational Intelligence and Intelligent Informatics, Vol.7, No.1, pp. 19-24, 2003.