Paper:

# A Similarity Rough Set Model for Document Representation and Document Clustering

## Nguyen Chi Thanh, Koichi Yamada, and Muneyuki Unehara

Department of Management and Information System Science, Nagaoka University of Technology, 1603-1 Kamitomioka, Nagaoka, Niigata 940-2188, Japan

*J. Adv. Comput. Intell. Intell. Inform.*, Vol.15 No.2, pp. 125-133, 2011.

- [1] T. B. Ho and K. Funakoshi, “Information retrieval using rough sets,” J. of Japanese Society for Aritificial Intelligence, Vol.13, No.3, pp. 424-433, 1997.
- [2] Y. Zhao and G. Karypis, “Hierarchical clustering algorithms for document datasets,” Data Mining and Knowledge Discovery, Vol.10, No.2, pp. 141-168, 2005.
- [3] I. S. Dhillon and D. S. Modha, “Concept decompositions for large sparse text data using clustering,” Machine Learning, Vol.42, No.1-2, pp. 143-175, 2001.
- [4] M. Steinbach, G. Karypis, and V. Kumar, “A comparison of document clustering techniques,” Proc. of the KDD Workshop on Text Mining, 2000.
- [5] Y. Li, S. M. Chung, and J. D. Holt, “Text document clustering based on frequent word meaning sequences,” Data and Knowledge Engineering, Vol.64, No.1, pp. 381-404, 2008.
- [6] M. Mahdavi and H. Abolhassani, “Harmony K-means algorithm for document clustering,” Data Mining and Knowledge Discovery, pp. 1-22, 2008.
- [7] G. Karypis, “CLUTO – A Clustering Toolkit,” 2003.

http://glaros.dtc.umn.edu/gkhome/cluto/cluto/download - [8] T. B. Ho and N. B. Nguyen, “Nonhierarchical document clustering based on a tolerance rough set model,” Int. J. of Intelligent Systems, Vol.17, No.2, pp. 199-212, 2002.
- [9] X.-J. Meng, Q.-C. Chen, and X.-L. Wang, “A tolerance rough set based semantic clustering method for web search results,” Information Technology J., Vol.8, No.4, pp. 453-464, 2009.
- [10] Z. Pawlak, “Rough sets,” Int. J. of Information and Computer Sciences, Vol.11, No.5, pp. 341-356, 1982.
- [11] Y. Y. Yao, S. K. M. Wong, and T. Y. Lin, “A review of rough set models,” Rough Sets and Data Mining: Analysis for Imprecise Data, pp. 47-73, 1997.
- [12] R. Slowinski and D. Vanderpooten, “A generalized definition of rough approximations based on similarity,” IEEE Trans. on Knowledge and Data Engineering, Vol.12, No.2, pp. 331-336, 2000.
- [13] R. Slowinski and D. Vanderpooten, “Similarity relation as a basis for rough approximations,” Advances in Machine Intelligents and Soft Computing, Vol.4, pp. 17-33, 1997.
- [14] J. Stefanowski and A. Tsoukias, “Incomplete Information Tables and Rough Classification,” Computational Intelligence, Vol.17, No.3, pp. 545-566, 2001.
- [15] R. D. Luce, “Semiorders and a Theory of Utility Discrimination,” Econometrica, Vol.24, No.2, pp. 178-191, 1956.
- [16] A. Strehl, J. Ghosh, and R. Mooney, “Impact of similarity measures on web-page clustering,” Proc. of the 17th National Conf. on Artificial Intelligence: Workshop of Artificial Intelligence forWeb search (AAAI 2000), Austin, TX, pp. 58-64, July 2000.
- [17] G. Salton and M. J. McGill, “Introduction to modern information retrieval,” MCGraw-Hill Book Company, 1983.
- [18] ftp://ftp.cs.cornell.edu/pub/smart

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.