A Neural N-Gram Network for Text Classification

Zhenguo Yan; Yue Wu

doi:10.20965/jaciii.2018.p0380

single-jc.php

« previous

JACIII Vol.22 No.3 pp. 380-386

doi: 10.20965/jaciii.2018.p0380

(2018)

Paper:

Views over last 60 days: 1,338

A Neural N-Gram Network for Text Classification

Zhenguo Yan and Yue Wu

Department of Computer Engineering and Science, Shanghai University
99 Shangda Road, BaoShan District, Shanghai, China

Received:

November 5, 2017

Accepted:

March 22, 2018

Published:

May 20, 2018

Keywords:

natural language processing, text classification, neural networks, n-gram

Abstract

Convolutional Neural Networks (CNNs) effectively extract local features from input data. However, CNN based on word embedding and convolution layers displays poor performance in text classification tasks when compared with traditional baseline methods. We address this problem and propose a model named NNGN that simplifies the convolution layer in the CNN by replacing it with a pooling layer that extracts n-gram embedding in a simpler way and obtains document representations via linear computation. We implement two settings in our model to extract n-gram features. In the first setting, which we refer to as seq-NNGN, we consider word order within each n-gram. In the second setting, BoW-NNGN, we do not consider word order. We compare the performance of these settings in different classification tasks with those of other models. The experimental results show that our proposed model achieves better performance than state-of-the-art models.

Cite this article as:

Z. Yan and Y. Wu, “A Neural N-Gram Network for Text Classification,” J. Adv. Comput. Intell. Intell. Inform., Vol.22 No.3, pp. 380-386, 2018.

Data files:

References

[1] S. Wang and C. D. Manning, “Baselines and bigrams: Simple, good sentiment and topic classification,” Procs. of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers, Vol.2, 2012.
[2] A. Joulin et al., “Bag of tricks for efficient text classification,” arXiv preprint arXiv:1607.01759, 2016.
[3] G. Mesnil et al., “Ensemble of generative and discriminative techniques for sentiment analysis of movie reviews,” arXiv preprint arXiv:1412.5335, 2014.
[4] Q. Le and T. Mikolov, “Distributed representations of sentences and documents,” Procs. of the 31st Int. Conf. on Machine Learning (ICML-14), 2014.
[5] S. Hochreiter et al., “Gradient flow in recurrent nets: the difficulty of learning long-term dependencies,” A Field Guide to Dynamical Recurrent Neural Networks, IEEE Press, 2001.
[6] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, Vol.9, No.8, pp. 1735-1780, 1997.
[7] D. M. Andrew and Q. V. Le, “Semi-supervised sequence learning,” Advances in Neural Information Processing Systems, pp. 3079-3087, 2015.
[8] Y. Kim, “Convolutional neural networks for sentence classification,” arXiv preprint arXiv:1408.5882, 2014.
[9] X. Zhang, J. Zhao, and Y. LeCun, “Character-level convolutional networks for text classification,” Advances in Neural Information Processing Systems, pp. 649-657, 2015.
[10] N. Kalchbrenner, E. Grefenstette, and P. Blunsom, “A convolutional neural network for modeling sentences,” Procs. of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014.
[11] T. Mikolov et al., “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
[12] R. Johnson and T. Zhang, “Effective use of word order for text categorization with convolutional neural networks,” arXiv preprint arXiv:1412.1058, 2014.
[13] A. M. Saxe, J. L. McClelland, and S. Ganguli, “Exact solutions to the nonlinear dynamics of learning in deep linear neural networks,” arXiv preprint arXiv:1312.6120, 2013.
[14] G. R. Hinton et al., “Improving neural networks by preventing co-adaptation of feature detectors,” arXiv preprint arXiv:1207.0580, 2012.
[15] A. L. Maas et al., “Learning word vectors for sentiment analysis,” Procs. of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies – Volume 1, Association for Computational Linguistics, 2011.
[16] J. McAuley and J. Leskovec, “Hidden factors and hidden topics: understanding rating dimensions with review text,” Procs. of the 7th ACM Conf. on Recommender Systems, 2013.
[17] R. Johnson and T. Zhang, “Semi-supervised convolutional neural networks for text categorization via region embedding,” Advances in Neural Information Processing Systems, pp. 919-927, 2015.
[18] R. Johnson and T. Zhang, “Supervised and semi-supervised text categorization using LSTM for region embeddings,” Int. Conf. on Machine Learning. 2016.
[19] K. Lang, “Newsweeder: Learning to filter netnews,” Procs. of the 12th Int. Conf. on Machine Learning, Vol.10, 1995.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[1] [1] S. Wang and C. D. Manning, “Baselines and bigrams: Simple, good sentiment and topic classification,” Procs. of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers, Vol.2, 2012.

[2] [2] A. Joulin et al., “Bag of tricks for efficient text classification,” arXiv preprint arXiv:1607.01759, 2016.

[3] [3] G. Mesnil et al., “Ensemble of generative and discriminative techniques for sentiment analysis of movie reviews,” arXiv preprint arXiv:1412.5335, 2014.

[4] [4] Q. Le and T. Mikolov, “Distributed representations of sentences and documents,” Procs. of the 31st Int. Conf. on Machine Learning (ICML-14), 2014.

[5] [5] S. Hochreiter et al., “Gradient flow in recurrent nets: the difficulty of learning long-term dependencies,” A Field Guide to Dynamical Recurrent Neural Networks, IEEE Press, 2001.

[6] [6] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, Vol.9, No.8, pp. 1735-1780, 1997.

[7] [7] D. M. Andrew and Q. V. Le, “Semi-supervised sequence learning,” Advances in Neural Information Processing Systems, pp. 3079-3087, 2015.

[8] [8] Y. Kim, “Convolutional neural networks for sentence classification,” arXiv preprint arXiv:1408.5882, 2014.

[9] [9] X. Zhang, J. Zhao, and Y. LeCun, “Character-level convolutional networks for text classification,” Advances in Neural Information Processing Systems, pp. 649-657, 2015.

[10] [10] N. Kalchbrenner, E. Grefenstette, and P. Blunsom, “A convolutional neural network for modeling sentences,” Procs. of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014.

[11] [11] T. Mikolov et al., “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.

[12] [12] R. Johnson and T. Zhang, “Effective use of word order for text categorization with convolutional neural networks,” arXiv preprint arXiv:1412.1058, 2014.

[13] [13] A. M. Saxe, J. L. McClelland, and S. Ganguli, “Exact solutions to the nonlinear dynamics of learning in deep linear neural networks,” arXiv preprint arXiv:1312.6120, 2013.

[14] [14] G. R. Hinton et al., “Improving neural networks by preventing co-adaptation of feature detectors,” arXiv preprint arXiv:1207.0580, 2012.

[15] [15] A. L. Maas et al., “Learning word vectors for sentiment analysis,” Procs. of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies – Volume 1, Association for Computational Linguistics, 2011.

[16] [16] J. McAuley and J. Leskovec, “Hidden factors and hidden topics: understanding rating dimensions with review text,” Procs. of the 7th ACM Conf. on Recommender Systems, 2013.

[17] [17] R. Johnson and T. Zhang, “Semi-supervised convolutional neural networks for text categorization via region embedding,” Advances in Neural Information Processing Systems, pp. 919-927, 2015.

[18] [18] R. Johnson and T. Zhang, “Supervised and semi-supervised text categorization using LSTM for region embeddings,” Int. Conf. on Machine Learning. 2016.

[19] [19] K. Lang, “Newsweeder: Learning to filter netnews,” Procs. of the 12th Int. Conf. on Machine Learning, Vol.10, 1995.