Barrage Text Classification with Improved Active Learning and CNN
Ningjia Qiu, Lin Cong, Sicheng Zhou, and Peng Wang
School of Computer Science and Technology, Changchun University of Science and Technology
No.7186 Weixing Road, Changchun, Jilin 130022, China
Traditional convolutional neural networks (CNNs) use a pooling layer to reduce the dimensionality of texts, but lose semantic information. To solve this problem, this paper proposes a convolutional neural network model based on singular value decomposition algorithm (SVD-CNN). First, an improved density-based center point clustering active learning sampling algorithm (DBC-AL) is used to obtain a high-quality training set at a low labelling cost. Second, the method uses the singular value decomposition algorithm for feature extraction and dimensionality reduction instead of a pooling layer, fuses the dimensionality reduction matrix, and completes the barrage text classification task. Finally, the partial sampling gradient descent algorithm (PSGD) is applied to optimize the model parameters, which accelerates the convergence speed of the model while ensuring stability of the model training. To verify the effectiveness of the improved algorithm, several barrage datasets were used to compare the proposed model and common text classification models. The experimental results show that the improved algorithm preserves the semantic features of the text more successfully, ensures the stability of the training process, and improves the convergence speed of the model. Further, the model’s classification performance on different barrage texts is superior to traditional algorithms.
-  G. Beatty, E. Kochis, and M. Bloodgood, “Impact of batch size on stopping active learning for text classification,” 2018 IEEE 12th Int. Conf. on Semantic Computing (ICSC), pp. 306-307, 2018.
-  H. Yu, C. Sun, W. Yang et al., “AL-ELM: one uncertainty-based active learning algorithm using extreme learning machine,” Neurocomputing, Vol.166, pp. 140-150, 2015.
-  Y. Zhang, M. Lease, and B. C. Wallace, “Active discriminative text representation learning,” Proc. of 31st AAAI Conf. on Artificial Intelligence, pp. 3386-3392, 2017.
-  M. Li, R. Wang, and K. Tang, “Combining Semi-Supervised and active learning for hyperspectral image classification,” 2013 IEEE Symp. on Computational Intelligence and Data Mining, pp. 89-94, 2013.
-  L. Wan, K. Tang, M. Li et al., “Collaborative Active and Semisupervised Learning for Hyperspectral Remote Sensing Image Classification,” IEEE Trans. on Geoscience and Remote Sensing, Vol.53, No.5, pp. 2384-2396, 2015.
-  Z. Wang, B. Du, L. Zhang et al., “A Novel Semisupervised Active-Learning Algorithm for Hyperspectral Image Classification,” IEEE Trans. on Geoscience and Remote Sensing, Vol.55, No.6, pp. 3071-3083, 2017.
-  S. Samiappan and R. J. Moorhead, “Semi-supervised co-training and active learning framework for hyperspectral image classification,” 2015 IEEE Int. Geoscience and Remote Sensing Symp., pp. 401-404, 2015.
-  T. Lei, R. Barzilay, and T. Jaakkola, “Molding CNNs for text: non-linear, non-consecutive convolutions,” Proc. of the 2015 Conf. on Empirical Methods in Natural Language Processing, pp. 1565-1575, 2015.
-  R. Johnson and T. Zhang, “Semi-supervised convolutional neural networks for text categorization via region embedding,” Proc. of the 28th Int. Conf. on Neural Information Processing Systems, pp. 919-927, 2015.
-  R. Johnson and T. Zhang, “Convolutional neural networks for text categorization: Shallow word-level vs. deep character-level,” arXiv preprint, arXiv:1609.00718, 2016.
-  N. Kalchbrenner, E. Grefenstette, and P. Blunsom, “A Convolutional Neural Network for Modelling Sentences,” arXiv preprint, arXiv:1404.2188, 2014.
-  Y. Kim, “Convolutional neural networks for sentence classification,” Proc. of the 2014 Conf. on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746-1751, 2014.
-  Y. Gao, W. Rong, Y. Shen et al., “Convolutional neural network based sentiment analysis using Adaboost combination,” 2016 Int. Joint Conf. on Neural Networks (IJCNN), pp. 1333-1338, 2016.
-  S. T. Hsu, C. Moon, P. Jones et al., “A Hybrid CNN-RNN Alignment Model for Phrase-Aware Sentence Classification,” Proc. of the 15th Conf. of the European Chapter of the Association for Computational Linguistics, Vol.2, pp. 443-449, 2017.
-  W. Yin, H. Schütze, B. Xiang et al., “ABCNN: Attention-based convolutional neural network for modeling sentence pairs,” Trans. of the Association for Computational Linguistics, Vol.4, pp. 566-567, 2016.
-  R. Chen, Y. Cao, and H. Sun, “Multi-class image classification with active learning and semi-supervised learning,” Acta Automatica Sinica, Vol.37, No.8, pp. 954-962, 2011.
-  C. Ke, L. Bin, K. Wende et al., “Chinese Micro-Sentiment Analysis Based on Multi-Channels Convolutional Neural Networks,” J. of Computer Research and Development, Vol.55, No.5, pp. 945-957, 2018.