Improving Text Categorization by Multicriteria Feature Selection
Son Doan*, and Susumu Horiguchi**
*Graduate School of Information Science, Japan Advanced Institute of Science and Technology, 1-1 Asahidai, Nomi, Ishikawa 923-1292, Japan
**Graduate School of Information Science, Tohoku University, 6-3-09 Aoba, Sendai 980-8579, Japan
Received:October 28, 2004Accepted:May 9, 2005Published:September 20, 2005
Keywords:feature selection, text categorization, machine learning, text mining, text representation
Text categorization involves assigning a natural language document to one or more predefined classes. One of the most interesting issues is feature selection. We propose an approach using multicriteria ranking of eatures, a new procedure for feature selection, and apply these to text categorization. Experimental results dealing with Reuters-21578 and 20Newsgroups benchmark data and the naive Bayes algorithm show that our proposal outperforms conventional feature selection in text categorization performance.
Cite this article as:S. Doan and S. Horiguchi, “Improving Text Categorization by Multicriteria Feature Selection,” J. Adv. Comput. Intell. Intell. Inform., Vol.9 No.5, pp. 570-575, 2005.Data files: