A Study of Support Vector Regression-Based Fuzzy <i>c</i>-Means Algorithm on Incomplete Data Clustering

Maolin Shi; Zihao Wang

doi:10.20965/jaciii.2022.p0483

single-jc.php

« previous

JACIII Vol.26 No.4 pp. 483-494

(2022)

doi: 10.20965/jaciii.2022.p0483

Paper:

Views over last 60 days: 2,077

A Study of Support Vector Regression-Based Fuzzy c-Means Algorithm on Incomplete Data Clustering

Maolin Shi^,,† and Zihao Wang^

^*School of Agricultural Engineering, Jiangsu University
301, Xuefu Road, Zhenjiang, Jiangsu Province 212013, China

^**Zhonghui Rubber Technology Co., Ltd.
Yuqi Industrial Zone, Wuxi, Jiangsu 214183, China

^***International School of Information Science and Engineering, Dalian University of Technology
No.2 Linggong Road, Ganjingzi District, Dalian City, Liaoning Province 116024, China

^†Corresponding author

Received:

December 26, 2021

Accepted:

March 17, 2022

Published:

July 20, 2022

Keywords:

data clustering, incomplete data, fuzzy clustering, support vector regression

Abstract

Support vector regression-based fuzzy c-means algorithm (SVR-FCM) clusters data according to their relationship among attributes, which can provide competitive clustering results for the dataset having functional relationship among attributes. In this paper, we study the performance of SVR-FCM on incomplete data clustering. The conventional incomplete data clustering strategies of fuzzy c-means algorithm (FCM) are first applied to SVR-FCM, and a new strategy named MIS strategy is designed to assist SVR-FCM handle incomplete data as well. A number of synthetic datasets are used to study the effect of data missing rate and missing attribute numbers on the performance of SVR-FCM based on different incomplete data clustering strategies. Several engineering datasets are used to test the performance of the current and proposed incomplete data clustering strategies for SVR-FCM. The results indicate that SVR-FCM can provide better clustering results than FCM for the dataset having functional relationship among attributes even if it has missing values, and the proposed MIS strategy can assist SVR-FCM to achieve the best clustering results for most datasets.

Cite this article as:

M. Shi and Z. Wang, “A Study of Support Vector Regression-Based Fuzzy c-Means Algorithm on Incomplete Data Clustering,” J. Adv. Comput. Intell. Intell. Inform., Vol.26 No.4, pp. 483-494, 2022.

Data files:

References

[1] A.-K. Shukla and P.-K. Muhuri, “Big-data clustering with interval type-2 fuzzy uncertainty modeling in gene expression datasets,” Engineering Applications of Artificial Intelligence, Vol.77, pp. 268-282, 2019.
[2] S. Majumder and D.-K. Pratihar, “Multi-sensors data fusion through fuzzy clustering and predictive tools,” Expert Systems with Applications, Vol.107, pp. 165-172, 2018.
[3] J. Arora and M. Tushir, “An Enhanced Spatial Intuitionistic Fuzzy C-Means Clustering for Image Segmentation,” Procedia Computer Science, Vol.167, pp. 646-655, 2020.
[4] X. Song, M. Shi, J. Wu, and W. Sun, “A new fuzzy c-means clustering-based time series segmentation approach and its application on tunnel boring machine analysis,” Mechanical Systems and Signal Processing, Vol.133, Article No.106279, 2019.
[5] C. Peng, Q. Zhang, Z. Kang, C. Chen, and Q. Cheng, “Kernel two-dimensional ridge regression for subspace clustering,” Pattern Recognition, Vol.113, Article No.107749, 2021.
[6] Y. Chen and Z. Yi, “Locality-constrained least squares regression for subspace clustering,” Knowledge-Based Systems, Vol.163, pp. 51-56, 2019.
[7] S. Blažič and I. Škrjanc, “Hybrid system identification by incremental fuzzy c-regression clustering,” IEEE Int. Conf. on Fuzzy Systems (FUZZ-IEEE), pp. 1-7, 2020.
[8] J.-N. Fuhg and A. Fau, “A classification-pursuing adaptive approach for Gaussian process regression on unlabeled data,” Mechanical Systems and Signal Processing, Vol.162, Article No.107976, 2022.
[9] J. Fang, X. Song, N. Yao, and M. Shi, “Application of FCM Algorithm Combined with Artificial Neural Network in TBM Operation Data,” Computer Modeling in Engineering & Sciences, Vol.126, No.1, pp. 397-417, 2021.
[10] M. Shi, T. Zhang, L. Zhang, W. Sun, and X. Song, “A fuzzy c-means algorithm based on the relationship among attributes of data and its application in tunnel boring machine,” Knowledge-Based Systems, Vol.191, Article No.105229, 2020.
[11] J.-K. Dixon, “Pattern recognition with partly missing data,” IEEE Trans. on Systems, Man, and Cybernetics, Vol.9, No.10, pp. 617-621, 1979.
[12] Q. Zhang and Z. Chen, “A distributed weighted possibilistic c-means algorithm for clustering incomplete big sensor data,” Int. J. of Distributed Sensor Networks, Vol.10, No.5, 2014.
[13] T. Furukawa, S. Ohnishi, and T. Yamanoi, “A study on a fuzzy clustering for mixed numerical and categorical incomplete data,” Int. Conf. on Fuzzy Theory and Its Applications, pp. 425-428, 2013.
[14] R.-J. Hathaway and J.-C. Bezdek, “Clustering incomplete relational data using the non-Euclidean relational fuzzy c-means algorithm,” Pattern Recognition Letters, Vol.23, Issue 1-3, pp. 151-160, 2002.
[15] L. Himmelspach and S. Conrad, “Fuzzy clustering of incomplete data based on cluster dispersion,” Proc. of Int. Conf. on Information Processing and Management of Uncertainty in Knowledge-Based Systems, pp. 59-68, 2010.
[16] B. Abidi and S.-B. Yahia, “A new algorithm for fuzzy clustering handling incomplete dataset,” Int. J. on Artificial Intelligence Tools, Vol.23, No.4, Article No.1460012, 2014.
[17] D.-Q. Zhang and S.-C. Chen, “Clustering incomplete data using kernel-based fuzzy c-means algorithm,” Neural Processing Letters, Vol.18, Issue 3, pp. 155-162, 2003.
[18] J.-V. Hulse and T.-M. Khoshgoftaar, “Incomplete-case nearest neighbor imputation in software measurement data,” Information Sciences, Vol.259, pp. 596-610, 2014.
[19] L. Zhang, W. Lu, X. Liu, W. Pedrycz, and C. Zhong, “Fuzzy c-means clustering of incomplete data based on probabilistic information granules of missing values,” Knowledge-Based Systems, Vol.99, pp. 51-70, 2016.
[20] J.-C. Bezdek, R. Ehrlich, and W. Full, “FCM: The fuzzy c-means clustering algorithm,” Computers & Geosciences, Vol.10, Issue 2-3, pp. 191-203, 1984.
[21] R.-J. Hathaway and J.-C. Bezdek, “Fuzzy c-means clustering of incomplete data,” IEEE Trans. on Systems, Man, and Cybernetics, Part B (Cybernetics), Vol.31, Issue 5, pp. 735-744, 2001.
[22] J.-M. Santos and M. Embrechts, “On the use of the adjusted rand index as a metric for evaluating supervised classification,” Proc. of Int. Conf. on Artificial Neural Networks, pp. 175-184, 2009.
[23] P.-A. Estévez, M. Tesmer, C.-A. Perez, and J.-M. Zurada, “Normalized mutual information feature selection,” IEEE Trans. on Neural Networks, Vol.20, Issue 2, pp. 189-201, 2009.
[24] A. Agogino and K. Goebel, “Milling Data Set,” BEST lab, UC Berkeley, NASA Ames Prognostics Data Repository, NASA Ames Research Center, Moffett Field, CA, 2007. http://ti.arc.nasa.gov/project/prognostic-data-repository [accessed August 23, 2020]
[25] A. Tsanas and A. Xifara, “Accurate quantitative estimation of energy performance of residential buildings using statistical machine learning tools,” Energy and Buildings, Vol.49, pp. 560-567, 2012.
[26] L. Fortuna, A. Rizzo, M. Sinatra, and M.-G. Xibilia, “Soft analyzers for a sulfur recovery unit,” Control Engineering Practice, Vol.11, Issue 12, pp. 1491-1500, 2003.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[B1] [1] A.-K. Shukla and P.-K. Muhuri, “Big-data clustering with interval type-2 fuzzy uncertainty modeling in gene expression datasets,” Engineering Applications of Artificial Intelligence, Vol.77, pp. 268-282, 2019.

[B2] [2] S. Majumder and D.-K. Pratihar, “Multi-sensors data fusion through fuzzy clustering and predictive tools,” Expert Systems with Applications, Vol.107, pp. 165-172, 2018.

[B3] [3] J. Arora and M. Tushir, “An Enhanced Spatial Intuitionistic Fuzzy C-Means Clustering for Image Segmentation,” Procedia Computer Science, Vol.167, pp. 646-655, 2020.

[B4] [4] X. Song, M. Shi, J. Wu, and W. Sun, “A new fuzzy c-means clustering-based time series segmentation approach and its application on tunnel boring machine analysis,” Mechanical Systems and Signal Processing, Vol.133, Article No.106279, 2019.

[B5] [5] C. Peng, Q. Zhang, Z. Kang, C. Chen, and Q. Cheng, “Kernel two-dimensional ridge regression for subspace clustering,” Pattern Recognition, Vol.113, Article No.107749, 2021.

[B6] [6] Y. Chen and Z. Yi, “Locality-constrained least squares regression for subspace clustering,” Knowledge-Based Systems, Vol.163, pp. 51-56, 2019.

[B7] [7] S. Blažič and I. Škrjanc, “Hybrid system identification by incremental fuzzy c-regression clustering,” IEEE Int. Conf. on Fuzzy Systems (FUZZ-IEEE), pp. 1-7, 2020.

[B8] [8] J.-N. Fuhg and A. Fau, “A classification-pursuing adaptive approach for Gaussian process regression on unlabeled data,” Mechanical Systems and Signal Processing, Vol.162, Article No.107976, 2022.

[B9] [9] J. Fang, X. Song, N. Yao, and M. Shi, “Application of FCM Algorithm Combined with Artificial Neural Network in TBM Operation Data,” Computer Modeling in Engineering & Sciences, Vol.126, No.1, pp. 397-417, 2021.

[B10] [10] M. Shi, T. Zhang, L. Zhang, W. Sun, and X. Song, “A fuzzy c-means algorithm based on the relationship among attributes of data and its application in tunnel boring machine,” Knowledge-Based Systems, Vol.191, Article No.105229, 2020.

[B11] [11] J.-K. Dixon, “Pattern recognition with partly missing data,” IEEE Trans. on Systems, Man, and Cybernetics, Vol.9, No.10, pp. 617-621, 1979.

[B12] [12] Q. Zhang and Z. Chen, “A distributed weighted possibilistic c-means algorithm for clustering incomplete big sensor data,” Int. J. of Distributed Sensor Networks, Vol.10, No.5, 2014.

[B13] [13] T. Furukawa, S. Ohnishi, and T. Yamanoi, “A study on a fuzzy clustering for mixed numerical and categorical incomplete data,” Int. Conf. on Fuzzy Theory and Its Applications, pp. 425-428, 2013.

[B14] [14] R.-J. Hathaway and J.-C. Bezdek, “Clustering incomplete relational data using the non-Euclidean relational fuzzy c-means algorithm,” Pattern Recognition Letters, Vol.23, Issue 1-3, pp. 151-160, 2002.

[B15] [15] L. Himmelspach and S. Conrad, “Fuzzy clustering of incomplete data based on cluster dispersion,” Proc. of Int. Conf. on Information Processing and Management of Uncertainty in Knowledge-Based Systems, pp. 59-68, 2010.

[B16] [16] B. Abidi and S.-B. Yahia, “A new algorithm for fuzzy clustering handling incomplete dataset,” Int. J. on Artificial Intelligence Tools, Vol.23, No.4, Article No.1460012, 2014.

[B17] [17] D.-Q. Zhang and S.-C. Chen, “Clustering incomplete data using kernel-based fuzzy c-means algorithm,” Neural Processing Letters, Vol.18, Issue 3, pp. 155-162, 2003.

[B18] [18] J.-V. Hulse and T.-M. Khoshgoftaar, “Incomplete-case nearest neighbor imputation in software measurement data,” Information Sciences, Vol.259, pp. 596-610, 2014.

[B19] [19] L. Zhang, W. Lu, X. Liu, W. Pedrycz, and C. Zhong, “Fuzzy c-means clustering of incomplete data based on probabilistic information granules of missing values,” Knowledge-Based Systems, Vol.99, pp. 51-70, 2016.

[B20] [20] J.-C. Bezdek, R. Ehrlich, and W. Full, “FCM: The fuzzy c-means clustering algorithm,” Computers & Geosciences, Vol.10, Issue 2-3, pp. 191-203, 1984.

[B21] [21] R.-J. Hathaway and J.-C. Bezdek, “Fuzzy c-means clustering of incomplete data,” IEEE Trans. on Systems, Man, and Cybernetics, Part B (Cybernetics), Vol.31, Issue 5, pp. 735-744, 2001.

[B22] [22] J.-M. Santos and M. Embrechts, “On the use of the adjusted rand index as a metric for evaluating supervised classification,” Proc. of Int. Conf. on Artificial Neural Networks, pp. 175-184, 2009.

[B23] [23] P.-A. Estévez, M. Tesmer, C.-A. Perez, and J.-M. Zurada, “Normalized mutual information feature selection,” IEEE Trans. on Neural Networks, Vol.20, Issue 2, pp. 189-201, 2009.

[B24] [24] A. Agogino and K. Goebel, “Milling Data Set,” BEST lab, UC Berkeley, NASA Ames Prognostics Data Repository, NASA Ames Research Center, Moffett Field, CA, 2007. http://ti.arc.nasa.gov/project/prognostic-data-repository [accessed August 23, 2020]

[B25] [25] A. Tsanas and A. Xifara, “Accurate quantitative estimation of energy performance of residential buildings using statistical machine learning tools,” Energy and Buildings, Vol.49, pp. 560-567, 2012.

[B26] [26] L. Fortuna, A. Rizzo, M. Sinatra, and M.-G. Xibilia, “Soft analyzers for a sulfur recovery unit,” Control Engineering Practice, Vol.11, Issue 12, pp. 1491-1500, 2003.

A Study of Support Vector Regression-Based Fuzzy c-Means Algorithm on Incomplete Data Clustering

Maolin Shi*,**,† and Zihao Wang***

Maolin Shi^,,† and Zihao Wang^