single-jc.php

JACIII Vol.23 No.2 pp. 268-273
doi: 10.20965/jaciii.2019.p0268
(2019)

Paper:

Research on Policy Text Clustering Algorithm Based on LDA-Gibbs Model

Haiqun Ma*,** and Tao Zhang***,†

*Center for Russian Language Literature and Culture, Heilongjiang University
Harbin, Heilongjiang 150080, China

**Research Center of Information Resource Management, Heilongjiang University
Harbin, Heilongjiang 150080, China

***Information and Network Center, Heilongjiang University
Harbin, Heilongjiang 150080, China

Corresponding author

Received:
May 31, 2018
Accepted:
July 24, 2018
Published:
March 20, 2019
Keywords:
LDA-Gibbs, topic model, text clustering, weighted algorithm
Abstract

Policy text contains large amount of diversified data and strictly conforms to standards and specifications, but the traditional text clustering method cannot solve the problems of high dimensionality, sparse features, and similar meanings, so this paper proposes a weighted algorithm based on the LDA-Gibbs model to improve the accuracy of policy text clustering. Firstly, it provides realistic basis for the assumptions of the LDA-Gibbs topic model and the weighted algorithm; secondly, it pre-processes the existing policy text simulated data, establishes the LDA-Gibbs model, forms a weighted algorithm, and generates training data to determine the number of optimal topics in the LDA-Gibbs model and completes the final clustering of the policy text; finally, by summarizing, classifying and deducing the conclusions of the experimental data, this paper proves the objective validity and effects of this method. Hopefully the overall design of this method can be applied in the prospective study on the formulation of new policies in the future, the retrospective evaluation and testing of the existing policies and the formation of a two-way interactive mechanism.

Cite this article as:
H. Ma and T. Zhang, “Research on Policy Text Clustering Algorithm Based on LDA-Gibbs Model,” J. Adv. Comput. Intell. Intell. Inform., Vol.23 No.2, pp. 268-273, 2019.
Data files:
References
  1. [1] L. Pei, J. Sun, and Z. Zhou, “Policy Text Computing: A New Methodology of Policy lnterpretation,” Library & Information, Vol.6, pp. 47-55, 2016.
  2. [2] S. C. Deerwester, S. T. Dumais, T. K. Landauer, et al., “Indexing by latent semantic analysis,” JASIS, Vol.41, No.6, pp. 391-407, 1990.
  3. [3] T. Hofmann, “Unsupervised Learning by Probabilistic Latent Semantic Analysis,” Machine Learning, No.1, pp. 177-196, 2001.
  4. [4] D. M. Blei, Y. N. Andrew, and I. J. Michael, “Latent Dirichlet Allocation,” J. of Machine Learning Research, No.3, pp. 993-1022, 2003.
  5. [5] C. Juan, Y. D. Zhang, L. I. JinTao, and T. Sheng, “A Method of Adaptively Selecting Best LDA Model Based on Density,” Chinese J. of Computers, No.10, pp. 1781-1787, 2008.
  6. [6] G. Peng and W. Yuefen, “Identifying Optimal Topic Numbers from Sci-Tech Information with LDA Model,” New Technology of Library and Information Service, Vol.274, No.9, pp. 42-50, 2016.
  7. [7] M. Hajjem and C. Latiri, “Combining IR and LDA Topic Modeling for Filtering Microblogs,” Procedia Computer Science, Vol.112, pp. 761-770, 2017.
  8. [8] B. P. Eddy, N. A. Kraft, and J. Gray, “Impact of structural weighting on a latent Dirichlet allocation-based feature location technique,” J. of Software-evolution and Process, Vol.30, No.1, pp. 1-25, 2018.
  9. [9] R. Mehrotra, S. Sanner, W. Buntine, and L. Xie, “Improving LDA topic models for microblogs via tweet pooling and automatic labeling,” Int. Acm Sigir Conf. on Research & Development in Information Retrieval, pp. 889-892, 2013.
  10. [10] L. R. Biggers, C. L. Bocovich, R. Capshaw, B. P. Eddy, L. H. Etzkorn, and N. A. Kraft, “Configuring latent Dirichlet allocation based feature location,” Empirical Software Engineering. Vol.19, No.3, pp. 465-500, 2014.
  11. [11] D. M. Blei and J. D. Lafferty, “Correlated Topic Models,” Advances in Neural Information Processing Systems, Vol.18, pp. 113-120, 2005.
  12. [12] L. Jiang, L. Yuanhao, H. Cui, and S. Jun, “Remolding the Policy Text Data through Documents Quantitative Research: The Formation, Transformation and Method Innovation of Policy Documents Quantitative Research,” J. of Public Management, Vol.12, No.2, pp. 138-144, 2015.
  13. [13] B. P. Eddy, N. A. Kraft, and J. Gray, “Impact of structural weighting on a latent Dirichlet allocation-based feature location technique,” J. of Software-evolution and Process, Vol.30, No.1, pp. 1-25, 2018.
  14. [14] M. G. M. S. Cardosoa, “Quality indices for (practical) clustering evaluation,” Intelligent Data Analysis, Vol.13, No.5, pp 725-740, 2009.
  15. [15] Institute of Computing Technology, Chinese Academ., ICTCLAS2016, http://ictclas.nlpir.org/ [accessed May 20, 2018]
  16. [16] T. F. Morello, L. Parry, N. Markusson, and J. Barlow, “Policy instruments to control Amazon fires: A simulation approach,” Ecological Economics, Vol.138, pp. 199-222, 2017.

*This site is desgined based on HTML5 and CSS3 for modern browsers, e.g. Chrome, Firefox, Safari, Edge, Opera.

Last updated on Oct. 01, 2024