Landmark FN-DBSCAN: An Efficient Density-Based Clustering Algorithm with Fuzzy Neighborhood
Hao Liu, Satoshi Oyama, Masahito Kurihara,
and Haruhiko Sato
Division of Synergetic Information Science in Graduate School of Information Science and Technology, Hokkaido University, Kita 14, Nishi 9, Kita-ku, Sapporo 060-0814, Japan
Clustering is an important tool for data analysis and many clustering techniques have been proposed over the past years. Among them are density-based clustering methods, which have several benefits such as the number of clusters is not required before carrying out clustering; the detected clusters can be represented in an arbitrary shape and outliers can be detected and removed. Recently, the density-based algorithms were extended with the fuzzy set theory, which has made these algorithm more robust. However, the density-based clustering algorithms usually require a time complexity of O(n2) where n is the number of data in the data set, implying that they are not suitable to work with large scale data sets. In this paper, a novel clustering algorithm called landmark fuzzy neighborhood DBSCAN (landmark FN-DBSCAN) is proposed. The concept, landmark, is used to represent a subset of the input data set which makes the algorithm efficient on large scale data sets. We give a theoretical analysis on time complexity and space complexity, which shows both of them are linear to the size of the data set. The experiments show that the landmark FN-DBSCAN is much faster than FN-DBSCAN and provides a very good quality of clustering.
-  R. Xu and D. C.Wunsch II, “Survey of clustering algorithms,” IEEE Trans. Neural Netw., Vol.16, No.3, pp. 645-678, 2005.
-  J. Moody and C. Darken, “Fast learning in networks of locallytuned processing units,” Neural Computation, Vol.1, No.2, pp. 281-294, 1989.
-  M.-C. Chiang, C.-W. Tsai, and C.-S. Yang, “A time-efficient pattern reduction algorithm for k-means clustering,” Information Sciences, Vol.181, No.4, pp. 716-731, 2011.
-  R. T. Ng and J. Han, “Clarans: A method for clustering objects for spatial data mining,” IEEE Trans. Knowl. Data Eng., Vol.14, No.5, pp. 1003-1016, 2002.
-  J. B. MacQueen, “Some methods for classification and analysis of multivariate observations,” in Proc. of the fifth Berkeley Symposium on Mathematical Statistics and Probability, Vol.1, pp. 281-297, 1967.
-  A. K. Jain, “Data clustering: 50 years beyond k-means,” Pattern Recognition Letters, Vol.31, No.8, pp. 651-666, 2010.
-  H. Zhexue, “Extensions to the k-means algorithm for clustering large data sets with categorical values,” Data Mining and Knowledge Discovery, Vol.304, No.3, pp. 283-304, 1999.
-  C. H. Q. Ding and X. He, “K-nearest-neighbor consistency in data clustering: incorporating local information into global optimization,” in Proc. ACM Symp. on Applied Computing, pp. 584-589, 2004.
-  C.-R. Lin and M.-S. Chen, “Combining partitional and hierarchical algorithms for robust and efficient data clustering with cohesion self-merging,” IEEE Trans. Knowl. Data Eng., Vol.17, No.2, pp. 145-159, 2005.
-  F. Murtagh and P. Contreras, “Algorithms for hierarchical clustering: an overview,” Wiley Interdisc. Rew.: Data Mining and Knowledge Discovery, Vol.2, No.1, pp. 86-97, 2012.
-  K.-L. Du, “Clustering: A neural network approach,” Neural Networks, Vol.23, No.1, pp. 89-107, Jan. 2010.
-  S. Theodoridis and K. Koutroumbas, “Pattern Recognition,” Fourth Edition, Academic Press, 2009.
-  Y. Song, S. Jin, and J. Shen, “A unique property of single-link distance and its application in data clustering,” Data & Knowledge Engineering, Vol.70, No.11, pp. 984-1003, 2011.
-  S. Guha, R. Rastogi, and K. Shim, “CURE: an efficient clustering algorithm for large databases,” in Proc. the 1998 ACM SIGMOD Int. Conf. on Management of Data, pp. 73-84, 1998.
-  J. Sander, “Density-based clustering,” in Encyclopedia of Machine Learning, pp. 270-273, 2010.
-  H.-P. Kriegel, P. Kröger, J. Sander, and A. Zimek, “Density-based clustering,” Wiley Interdisc. Rew.: Data Mining and Knowledge Discovery, Vol.1, No.3, pp. 231-240, 2011.
-  M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise,” in Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining (KDD-96), pp. 226-231, 1996.
-  M. Ankerst, M. M. Breunig, H.-P. Kriegel, and J. Sander, “Optics: Ordering points to identify the clustering structure,” in Proc. Int. Conf. on Management of Data and Symposium on Principles of Database Systems Philadelphia (SIGMOD/PODS 1999), New York, NY, pp. 49-60, 1999.
-  P. Viswanath and V. Suresh Babu, “Rough-dbscan: A fast hybrid density based clustering method for large data sets,” Pattern Recogn. Lett., Vol.30, No.16, pp. 1477-1488, Dec. 2009.
-  A. Hinneburg and H.-H. Gabriel, “Denclue 2.0: fast clustering based on kernel density estimation,” in Proc. the 7th Int. Conf. on Intelligent data analysis, ser. IDA’07, Berlin, Heidelberg: Springer-Verlag, pp. 70-80, 2007.
-  B. Borah and D. Bhattacharyya, “A clustering technique using density difference,” in Proc. Int. Conf. on Signal Processing, Communications and Networking 2007, pp. 585-588, Feb. 2007.
-  C. Fraley and A. E. Raftery, “Enhanced model-based clustering, density estimation, and discriminant analysis software: Mclust,” J. Classification, No.2, pp. 263-286, 2003.
-  G. Karypis, E.-H. S. Han, and V. Kumar, “Chameleon: A hierarchical clustering algorithm using dynamic modeling,” IEEE Trans. Comput., 1999.
-  A. Ng, M. Jordan, and Y. Weiss, “On spectral clustering: Analysis and an algorithm,” Advances in Neural Information Processing Systems, Vol.14, No.2, pp. 849-856, 2001.
-  I. S. Dhillon, Y. Guan, and B. Kulis, “Kernel k-means: spectral clustering and normalized cuts,” in Proc. of the tenth ACM SIGKDD Int. Conf. on Knowledge discovery and data mining, pp. 551-556, 2004.
-  E. N. Nasibov, “A robust algorithm for solution of the fuzzy clustering problem on the basis of the fuzzy joint points method,” Cybernetics and Sys. Anal., Vol.44, No.1, pp. 7-17, Jan. 2008.
-  E. N. Nasibov and G. Ulutagay, “A new unsupervised approach for fuzzy clustering,” Fuzzy Sets and Systems, Vol.158, No.19, pp. 2118-2133, 2007.
-  E. N. Nasibov and G. Ulutagay, “Robustness of density-based clustering methods with various neighborhood relations,” Fuzzy Sets and Systems, Vol.160, No.24, pp. 3601-3615, 2009.
-  M. Halkidi, Y. Batistakis, and M. Vazirgiannis, “Cluster validity methods: Part I,” SIGMOD Record, Vol.31, No.2, pp. 40-45, 2002.