Fuzzy c-Means Clustering for Uncertain Data Using Quadratic Penalty-Vector Regularization
Yasunori Endo*, Yasushi Hasegawa**, Yukihiro Hamasuna*,
and Yuchi Kanzawa***
*Department of Risk Engineering, Faculty of Systems and Information Engineering, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki 305-8573, Japan
**DENSO Co., Ltd., 1-1 Showa-cho, Kariya, Aichi 448-8661, Japan
***Faculty of Engineering, Shibaura Institute of Technology, 3-7-5 Toyosu, Koto-ku, Tokyo 135-8548, Japan
Clustering – defined as an unsupervised data-analysis classification transforming real-space information into data in pattern space and analyzing it – may require that data be represented by a set, rather than points, due to data uncertainty, e.g., measurement error margin, data regarded as one point, or missing values. These data uncertainties have been represented as interval ranges for which many clustering algorithms are constructed, but the lack of guidelines in selecting available distances in individual cases has made selection difficult and raised the need for ways to calculate dissimilarity between uncertain data without introducing a nearest-neighbor or other distance. The tolerance concept we propose represents uncertain data as a point with a tolerance vector, not as an interval, while this is convenient for handling uncertain data, tolerance-vector constraints make mathematical development difficult. We attempt to remove the tolerance-vector constraints using quadratic penaltyvector regularization similar to the tolerance vector. We also propose clustering algorithms for uncertain data considering optimization and obtaining an optimal solution to handle uncertainty appropriately.
-  J. MacQueen, “Some methods for classification and analysis of multivariate observations,” Proc. of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Vol.1, pp.281-297, 1967.
-  J. C. Bezdek, “Pattern Recognition with Fuzzy Objective Function Algorithms,” Plenum, New York, 1981.
-  O. Takata and S. Miyamoto, “Fuzzy clustering of Data with Interval Uncertainties,” J. of Japan Society for Fuzzy Theory and Systems, Vol.12, No.5, pp. 686-695, 2000. (in Japanese)
-  Y. Endo and K. Horiuchi, “On Clustering Algorithm for Fuzzy Data,” In Proc. 1997 Int. Symposium on Nonlinear Theory and Its Applications, pp. 381-384, 1997.
-  Y. Endo, “Clustering Algorithm Using Covariance for Fuzzy Data,” In Proc. 1998 Int. Symposium on Nonlinear Theory and Its Applications, pp. 511-514, 1998.
-  Y. Endo, R. Murata, H. Haruyama, and S. Miyamoto, “Fuzzy c-Means for Data with Tolerance,” Proc. 2005 Int. Symposium on Nonlinear Theory and Its Applications, pp. 345-348, 2005.
-  R. Murata, Y. Endo, H. Haruyama, and S. Miyamoto, “On Fuzzy c-Means for Data with Tolerance,” J. of Advanced Computational Intelligence and Intelligent Informatics Vol.10, No.5, pp. 673-681, 2006.
This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.