Variable Weighting in PCA-Guided k-Means and its Connection with Information Summarization
Katsuhiro Honda, Akira Notsu, and Hidetomo Ichihashi
Graduate School of Engineering, Osaka Prefecture University, 1-1 Gakuen-cho, Nakaku, Sakai, Osaka 599-8531, Japan
In the present paper, a variable selection model in k-Means is proposed, in which a variable weighting mechanism is introduced to PCA-guided k-Means. Variable weights are estimated in a manner similar to FCM clustering, while the membership indicator is derived using a PCA-guided method, in which the principal component scores are calculated by considering the variable weights. The variable weights emphasize the variables that have meaningful cluster information in the calculation of the membership indicators, and the absolute responsibility of each variable is revealed by soft transition to possibilistic values. It is also shown that the variable weights are derived in a manner similar to variable selection for PCA, with the goal being information summarization. The characteristics of the proposed method are demonstrated in an application to document clustering.
-  J. B. MacQueen, “Some methods of classification and analysis of multivariate observations,” Proc. of 5th Berkeley Symposium on Math. Stat. and Prob., pp. 281-297, 1967.
-  C. Ding and X. He, “K-means clustering via principal component analysis,” Proc. of Int. Conf. Machine Learning (ICML 2004), pp. 225-232, 2004.
-  J. Z. Huang, M. K. Ng, H. Rong, and Z. Li, “Automated variable weighting in k-means type clustering,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.27, No.5, pp. 657-668, 2005.
-  J. C. Bezdek, “Pattern Recognition with Fuzzy Objective Function Algorithms,” Plenum Press, 1981.
-  R. Krishnapuram and J.M. Keller, “A possibilistic approach to clustering,” IEEE Trans. on Fuzzy Systems, Vol.1, pp. 98-110, 1993.
-  F.Masulli and S. Rovetta, “Soft transition from probabilistic to possibilistic fuzzy clustering,” IEEE Trans. on Fuzzy Systems, Vol.14, No.4, pp. 516-527, 2006.
-  K. Honda, H. Ichihashi, F. Masulli, and S. Rovetta, “Linear fuzzy clustering with selection of variables using graded possibilistic approach,” IEEE Trans. Fuzzy Systems, Vol. 15, No. 5, pp. 878-889, 2007.
-  N. R. Pal, K. Pal, J. M. Keller, and J. C. Bezdek, “A possibilistic fuzzy c-means clustering algorithm,” IEEE Trans. on Fuzzy Systems, Vol.13, No.4, pp. 508-516, 2005.
-  H. Zha, C. Ding, M. Gu, X. He, and H. Simon, “Spectral relaxation for K-means clustering,” Advances in Neural Information Processing Systems 14 (Proc. of NIPS 2001), pp. 1057-1064, 2002.
-  C. Ding and X. He, “Linearized cluster assignment via spectral ordering,” Proc. of Int. Conf. Machine Learning (ICML 2004), pp. 233-240, 2004.
-  K. Rose, E. Gurewitz, and G. Fox, “A deterministic annealing approach to clustering,” Pattern Recognition Letters, Vol.11, pp. 589-594, 1990.
-  S. Miyamoto and M. Mukaidono, “Fuzzy c-Means as a regularization and maximum entropy approach,” Proc. of the 7th Int. Fuzzy Systems Association World Congress, Vol.2, pp. 86-92, 1997.
-  K. Honda, H. Ichihashi, A. Notsu, F. Masulli, and S. Rovetta, “Several formulations for graded possibilistic approach to fuzzy clustering,” Rough Sets and Current Trends in Computing (RSCTC) 2006, Lecture Notes in Artificial Intelligence 4259, pp. 939-948, Springer, 2006.
-  K. Honda and H. Ichihashi, “Linear fuzzy clustering techniques with missing values and their application to local principal component analysis,” IEEE Trans. Fuzzy Systems, Vol.12, No.2, pp. 183-193, 2004.
-  K. Honda and H. Ichihashi, “Regularized linear fuzzy clustering and probabilistic PCA mixture models,” IEEE Trans. Fuzzy Systems, Vol.13, No.4, pp. 508-516, 2005.
-  I. T. Jolliffe, “Discarding variables in a principal component analysis. I. Artificial data,” Appl. Statist., Vol.21, pp. 160-173, 1972.
-  Y. Tanaka and Y. Mori, “Principal component analysis based on a subset of variables: variable selection and sensitivity analysis,” American J. of Mathematics and Management Sciences, Vol.17, No.1&2, pp. 61-89, 1997.
-  VASpca (VAriable Selection in Principal Component Analysis),