Improving Text Categorization by Multicriteria Feature Selection
Son Doan*, and Susumu Horiguchi**
*Graduate School of Information Science, Japan Advanced Institute of Science and Technology, 1-1 Asahidai, Nomi, Ishikawa 923-1292, Japan
**Graduate School of Information Science, Tohoku University, 6-3-09 Aoba, Sendai 980-8579, Japan
Text categorization involves assigning a natural language document to one or more predefined classes. One of the most interesting issues is feature selection. We propose an approach using multicriteria ranking of eatures, a new procedure for feature selection, and apply these to text categorization. Experimental results dealing with Reuters-21578 and 20Newsgroups benchmark data and the naive Bayes algorithm show that our proposal outperforms conventional feature selection in text categorization performance.