Label Propagation for Text Classification Using Latent Topics
Akiko Eriguchi and Ichiro Kobayashi
Advanced Sciences, Graduate School of Humanities and Sciences, Ochanomizu University, 2-1-1 Otsuka, Bunkyo-ku, Tokyo 112-8610, Japan
The objective of this paper is to raise the accuracy of multiclass text classification through Graph-Based Semi-Supervised Learning (GBSSL). In GBSSL, it is essential to construct a proper graph which expresses the relation among nodes. We propose a method to construct a similarity graph by employing both surface information and latent information to express similarity between nodes. Experimenting on a Reuters-21578 corpus, we have confirmed that our proposal works well in raising the accuracy of GBSSL in a multiclass text classification task.
-  H. Scudder, “Probability of error of some adaptive patternrecognition machines,” IEEE Trans. on Information Theory, Vol.11, No.3, pp. 363-371, 1965.
-  A. Blum and T. Mitchell, “Combining labeled and unlabeled data with co-training,” In Proc. of the eleventh Annual Conf. on Computational Learning Theory, pp. 92-100, 1998.
-  T. Joachims, “Transductive Inference for Text Classification using Support Vector Machines,” In Proc. of the Sixteenth Int. Conf. on Machine Learning, pp. 200-209, 1999.
-  A. Subramanya and J. Bilmes, “Soft-Supervised Learning for Text Classification,” In Proc. of the 2008 Conf. on Empirical Methods in Natural Language Processing, pp. 1090-1099, 2008.
-  D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Schölkopf, “Learning with Local and Global Consistency,” Advances in Neural Information Processing Systems, Vol.16, pp. 321-328, 2004.
-  X. Zhu, Z. Ghahramani, and J. Lafferty, “Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions,” In Proc. of the Twentieth Int. Conf. on Machine Learning, pp. 912-919, 2003.
-  X. Zhu, “Semi-supervised Learning with Graphs,” Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA, USA, 2005.
-  Q. Gu and J. Han, “Towards Active Learning on Graphs: An Error Bound Minimization Approach,” IEEE Int. Conf. on Data Mining, pp. 882-887, 2012.
-  T. Jebara, J. Wang, and S.-F. Chang, “Graph construction and bmatching for semi-supervised learning,” In Proc. of the 26th Annual Int. Conf. on Machine Learning, pp. 441-448, 2009.
-  K. Ozaki, M. Shimbo, M. Komachi, and Y. Matsumoto, “Using the mutual k-nearest neighbor graphs for semi-supervised classification of natural language Data,” In Proc. of the Fifteenth Conf. on Computational Natural Language Learning, pp. 154-162, 2011.
-  G. Salton and M. J. McGill, “Introduction to Modern Information Retrieval,” McGraw-Hill, 1983.
-  D. M. Blei, A. Y. Ng, andM. I. Jordan, “Latent Dirichlet allocation,” Machine Learning Research, Vol.3, pp. 993-1022, 2003.
-  A. B. Goldberg and X. Zhu, “Seeing stars when there aren’t many stars: graph-based semi-supervised learning for sentiment categorization,” In Proc. of HLT-NAACL 2006 Workshop on TextGraphs: Graph-based Algorithms for Natural Language Processing, pp. 45-52, 2006.
This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.