Topic Evolution Analysis Based on Cluster Topic Model
Yaoyi Xi, Gang Chen, Bicheng Li, and Yongwang Tang
Zhengzhou Information Science and Technology Institute
Zhengzhou 450001, China
Topic evolution analysis helps to understand how the topics evolve or develop along the timeline. Aiming at the problem that existing researches did not mine the latent semantic information in depth and needed to pre-determine the number of clusters, this paper proposes cluster topic model based method to analyze topic evolution analysis. Firstly, a new topic model, namely cluster topic model, is built to complete document clustering while mining latent semantic information. Secondly, events are detected according to the cluster label of each document and evolution relationship between any two events is identified based on the aspect distributions of documents. Finally, by choosing the representative document of each event, topic evolution graph is constructed to display the development of the topic along the timeline. Experiments are presented to show the performance of our proposed technique. It is found that our proposed technique outperforms the comparable techniques in previous work.
-  W. Cui, S. Liu, L. Tan et al., “Textflow: Towards better understanding of evolving topics in text,” IEEE Trans. on Visualization and Computer Graphics, Vol.17, No.12, pp. 2412-2421, 2011.
-  J. H. Lau, N. Collier, and T. Baldwin, “On-line Trend Analysis with Topic Models: #Twitter Trends Detection Topic Model Online,” Proc. of the 24th Int. Conf. on Computational Linguistics, pp. 1519-1534, 2012.
-  P. Lee, L. V. S. Lakshmanan, and E. E. Milios, “Event Evolution Tracking from Streaming Social Posts,” arXiv:1311.5978v1, 2013.
-  J. Allan, “Topic Detection and Tracking: Event-based information Organization,” Kluwer Academic Publisher, 2002.
-  W. Ding and C. Chen, “Dynamic topic detection and tracking: A comparison of HDP, Co-Word, and Co-Citation Methods,” J. of the American Society for Information Science and Technology, pp. 1-14, 2014.
-  A. Ahmed, Q. Ho, J. Eisenstein et al., “Unified Analysis of Streaming News,” Proc. of the 20th Int. Conf. on World Wide Web, ACM, pp. 267-276, 2011.
-  A. Ahmed, Q. Ho, C. H. Teo et al., “Online inference for the infinite topic-cluster model: Storylines from streaming text,” Int. Conf. on Artificial Intelligence and Statistics, pp. 101-109, 2011.
-  Y. Hu, L. Bai, and W. Zhang, “Modeling and Analyzing Topic Evolution,” Acta Automatica Sinica, Vol.38, No.10, pp. 1690-1697, 2012.
-  Y. Fang, H. Huang, X. Xin et al., “Topic Evolutionary Analysis for Dynamic Topic Number,” J. of Chinese Information Processing, Vol.28, No.3, pp. 142-149, 2014.
-  D. Shahaf and C. Guestrin, “Connecting Two (or Less) Dots: Discovering Structure in News Articles,” ACM Trans. on Knowledge Discovery from Data (TKDD), Vol.5, No.4, pp. 24, 2012.
-  D. Shahaf and C. Guestrin, “Connecting the Dots Between News Articles,” Proc. of the 16th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining. ACM, pp. 623-632, 2010.
-  X. Tannier and V. Moriceau, “Building Event Threads out of Multiple News Articles,” EMNLP, pp. 958-967, 2013.
-  A. Feng and J. Allan, “Finding and Linking Incidents in News,” Proc. of the 16th ACM Conf. on Information and Knowledge Management, pp. 821-829, 2007.
-  A. Feng and J. Allan, “Incident Threading for News Passages,” Proc. of the 18th ACM Conf. on Information and Knowledge Management. pp. 1307-1316, 2009.
-  R. Nallapati, A. Feng, F. Peng, and J. Allan, “Event threading within news topics,” Proc.13th ACM Int. Conf. Inf. Knowl. Management, pp. 446-453, 2004.
-  C. C. Yang, X. Shi, and C. P. Wei, “Discovering Event Evolution Graphs From News Corpora,” IEEE Trans. on Systems, Man, and Cybernetics – Part A: Systems and Humans, Vol.39, No.4, pp. 850-863, 2009.
-  D. Luo, J. Yang, M. Krstajic et al., “EventRiver: Visually Exploring Text Collections with Temporal References,” IEEE Trans. on Visualization and Computer Graphics, Vol.18, No.1, pp. 93-105, 2012.
-  X. Zhao, C. Yang, B. Li et al., “A Topic Evolution Mining Algorithm of News Text Based on Feature Evolving,” Chinese J. of Computers, Vol.27, No.4, pp. 819-832, 2014.
-  D. Blei, A. Ng, and M. Jordan, “Latent dirichlet allocation,” J. of Machine Learning Research, pp. 993-1022, 2003.
-  Y. Teh, M. Jordan, M. Beal et al., “Hierarchical Dirichlet processes,” J. of the American Statistical Association, Vol.101, No.476, pp. 1566-1581, 2006.
-  Y. Lu, Q. Mei, and C. Zhai, “Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA,” Information Retrieval, Vol.14, No.2, pp. 178-203, 2011.
-  H. M. Wallach, “Structured topic models for language,” University of Cambridge, 2008.
-  P. Xie and E. Xing, “Integrating Document Clustering and Topic Modeling,” Proc. of the 29th Conf. Annual Conf. on Uncertainty in Artificial Intelligence (UAI-13), pp. 694-703.
-  M. Serizawa and I. Kobayashi, “Topic Tracking Based on Identifying Proper No.of the Latent Topics in Documents,” J. of Advanced Computational Intelligence and Intelligent Informatics, Vol.16, No.5, pp. 611-618, 2012.
-  J. Makkonen, “Investigations on Event Evolution in TDT,” Proc. of HLT-NAACL 2003 student research workshop, Edmonton, pp. 43-48, 2003.
-  T. Ferguson, “A Bayesian Analysis of Some Nonparametric Problems,” Annals of Statistics, Vol.1, No.2, pp. 209-230, 1973.
-  E. Gansner, E. Koutsofios, and S. North, “Drawing graphs with dot,” Technical report, AT&T Research,
http://www.graphviz.org/Documentation/dotguide.pdf, 2006 [Accessed Apr. 20, 2015]