
JACIII Vol.14 No.6 pp. 624-630
doi: 10.20965/jaciii.2010.p0624


Applying Naive Bayes Classifier to Document Clustering

Jie Ji and Qiangfu Zhao

System Intelligence Lab., The University of Aizu, Tsuruga, Ikki-machi, Aizu-wakamatsu, Fukushima 965-8580, Japan

January 29, 2010
July 15, 2010
September 20, 2010
document clustering, Naive Bayes Classifier, Iterative Bayes Clustering, k-means, comparative advantage
Document clustering partitions sets of unlabeled documents so that documents in clusters share common concepts. A Naive Bayes Classifier (BC) is a simple probabilistic classifier based on applying Bayes’ theorem with strong (naive) independence assumptions. BC requires a small amount of training data to estimate parameters required for classification. Since training data must be labeled, we propose an Iterative Bayes Clustering (IBC) algorithm. To improve IBC performance, we propose combining IBC with Comparative Advantage-based (CA) initialization method. Experimental results show that our proposal improves performance significantly over classical clustering methods.
Cite this article as:
J. Ji and Q. Zhao, “Applying Naive Bayes Classifier to Document Clustering,” J. Adv. Comput. Intell. Intell. Inform., Vol.14 No.6, pp. 624-630, 2010.
