FCM-Type Fuzzy Clustering of Mixed Databases Considering Nominal Variable Quantification
Katsuhiro Honda, Ryo Uesugi, and Hidetomo Ichihashi
Graduate School of Engineering, Osaka Prefecture University, 1-1 Gakuen-cho, Nakaku, Sakai, Osaka 599-8531, Japan
This paper proposes a clustering algorithm that performs FCM-type clustering of datasets including categorical data. The proposed algorithm iterates categorical data quantification in FCE clustering so that quantified scores suit the current fuzzy partition. The objective function is the linear combination of two cost functions, i.e., the objective function of FCE clustering and the clustering criterion of quantified category scores. Because quantified category scores are assigned considering the relationship among categories, they are useful for interpreting the cluster structure.
-  J. C. Bezdek, “Pattern Recognition with Fuzzy Objective Function Algorithms,” Plenum Press, 1981.
-  J. C. Bezdek, C. Coray, R. Gunderson, and J. Watson, “Detection and characterization of cluster substructure 2. fuzzy c-varieties and convex combinations thereof,” SIAM J. Appl. Math., Vol.40, No.2, pp. 358-372, 1981.
-  J. B. MacQueen, “Some methods of classification and analysis of multivariate observations,” Proc. 5th Berkeley Symposium on Math. Stat. and Prob., pp. 281-297, 1967.
-  Z. Huang, “Extensions to the k-means algorithm for clustering large data sets with categorical values,” Data Mining and Knowledge Discovery, Vol.2, No.3, pp. 283-304, 1998.
-  O. M. San, V.-N. Huynh, and Y. Nakamori, “A clustering algorithm for mixed numeric and categorical data,” Journal of Systems Science and Complexity, Vol.16, No.4, pp. 562-571, 2003.
-  Z. Huang and M. K. Ng, “A fuzzy k-modes algorithm for clustering categorical data,” IEEE Transactions on Fuzzy Systems, Vol.7, No.4, pp. 446-452, 1999.
-  D.-W. Kim, K. H. Lee, and D. Lee, “Fuzzy clustering of categorical data using fuzzy centroids,” Pattern Recognition Letters, Vol.25, pp. 1263-1271, 2004.
-  C. Hayashi, “On the prediction of phenomena from qualitative data and the quantification of qualitative data from the mathematical statistical point of view,” Annals of the Institute of Statistical Mathematics, Vol.3, pp. 69-98, 1952.
-  A. Gifi, “Nonlinear Multivariate Analysis,” Wiley, 1990.
-  J. Bond and G. Michailidis, “Homogeneity analysis in Lisp-Stat,” Journal of Statistical Software, Vol.1, Issue 2, 1996.
-  P. Whittle, “On principal components and least square methods of factor analysis,” Skand. Akt., Vol.36, pp. 223-239, 1952.
-  F. W. Young, Y. Takane, and J. de Leeuw, “Principal components of mixed measurement level multivariate data: An alternating least squares method with optimal scaling features,” Psychometrika, Vol.43, pp. 279-281, 1978.
-  S. van Buuren and W. J. Heiser, “Clustering N objects into K groups under optimal scaling of variables,” Psychometrika, Vol.54, No.4, pp. 699-706, 1989.
-  K. Honda and H. Ichihashi, “Linear fuzzy clustering techniques with missing values and their application to local principal component analysis,” IEEE Transactions on Fuzzy Systems, Vol.12, No.2, pp. 183-193, 2004.
-  K. Honda and H. Ichihashi, “Regularized linear fuzzy clustering and probabilistic pca mixture models,” IEEE Transactions on Fuzzy Systems, Vol.13, No.4, pp. 508-516, 2005.
-  R. N. Davé, “Characterization and detection of noise in clustering,” Pattern Recognition Letters, Vol.12, No.11, pp. 657-664, 1991.
-  R. Krishnapuram and J. M. Keller, “A possibilistic approach to clustering,” IEEE Transactions on Fuzzy Systems, Vol.1, pp. 98-110, 1993.