An Iterative Approach for Fuzzy Clustering Based on Feature Significance
Jianchao Han and Mohsen Beheshti
Department of Computer Science, California State University Dominguez Hills, 1000 E. Victoria St. Carson, CA 90747 USA
Clustering is a technique to group a set of unsupervised data based on the conceptual clustering principle: maximizing the intraclass similarity and minimizing the interclass similarity. Existing clustering approaches concentrate in the different data types and assume that all features play the same role in algorithm validations. However, some features may be more significant than others in forming clusters. In this paper, we consider the feature significance and include it in the clustering algorithms. An iterative approach for fuzzy clustering based on the feature significance is presented and applied in the k-means algorithm for numerical data, the k-modes algorithm for categorical data, and the k-prototypes algorithm for mixed data.
-  Y. El-Sonbaty and M. Ismail, “Fuzzy clustering for symbolic data,” IEEE Transactions on Fuzzy Systems, 6(2), pp. 195-204, 1998.
-  F. Hoppner, F. Klawonn, R. Kruse, and T. Runkler, “Fuzzy Clustering Analysis,” Wiley, Chichester, 1999.
-  Z. Huang, “Clustering large data sets with mixed numeric and categorical values,” Lecture Notes in Artificial Intelligence, Springer, pp. 21-34, 1997.
-  Z. Huang and M. K. Ng, “A fuzzy k-modes algorithm for clustering categorical data,” IEEE Transactions on Fuzzy Systems, 7(4), pp. 15-37, 1999.
-  R. Krishnapuram and J. Keller, “A Possibilistic Approach to Clustering,” IEEE Trans. On Fuzzy Systems, 1, pp. 98-110, 1993.
-  O. M. San, V-N. Huynh, and Y. Nakamori, “An Alternative Extension of the k-means Algorithm for Clustering Categorical Data,” Int. J. Appl. Math. Comput. Sci., 14(2), pp. 241-247, 2004.
-  M. Yang, P. Hwang, and D. Chen, “Fuzzy clustering algorithms for mixed feature variables,” Fuzzy Sets and Systems, 141(2), pp. 301-317, 2004.
-  J. Han, R. Sanchez, and X. Hu, “Feature Selection Base don Rough Set and Information Entropy: An Experimental Study,” Lecture Notes on AI 3641, Springer, pp. 204-213, 2005.
-  C. Doring, C. Borgelt, and R. Kruse, “Fuzzy Clustering of Quantitative and Qualitative Data,” Proc. Conf. North American Fuzzy Information Processing Society (NAFIPS), pp. 84-89, Banff, Alberta, Canada, 2004.
-  J. Li, X. Gao, and L. Jiao, “A New Feature Weighted Fuzzy Clustering Algorithm,” Lecture Notes on AI 3641, Springer, pp. 412-420, 2005.
-  K. Kira and L. A. Rendell, “The Feature Selection Problem: Traditional Methods and a New Algorithm,” Proc. of the 9th Conference on Artificial Intelligence, pp. 129-134, 1992.
-  K. C. Gowda and E. Diday, “Symbolic clustering using a new dissimilarity measure,” Pattern Recognition, 24(6), pp. 567-578, 1991.
-  E. Gustafson and W. Kessel, “Fuzzy Clustering with a Fuzzy Covariance Matrix,” Proc. 18th IEEE Conference on Decision and Control, pp. 761-766, San Diego, CA, 1979.
-  I. Gath and A. B. Geva, “Unsupervised optimal fuzzy clustering,” IEEE Transactions on pattern Analysis and Machine Intelligence, 11(7), pp. 773-781, 1989.