Topic Model Based New Event Detection Within Topics

Yaoyi Xi; Bicheng Li; Yongwang Tang

doi:10.20965/jaciii.2016.p0467

single-jc.php

« previous

JACIII Vol.20 No.3 pp. 467-476

doi: 10.20965/jaciii.2016.p0467

(2016)

Paper:

Views over last 60 days: 1,497

Topic Model Based New Event Detection Within Topics

Yaoyi Xi, Bicheng Li, and Yongwang Tang

Zhengzhou Information Science and Technology Institute
Zhengzhou 450002, China

Received:

January 10, 2016

Accepted:

March 22, 2016

Published:

May 19, 2016

Keywords:

new event detection, topic evolution, hierarchical dirichlet process, sequential Gibbs sampling

Abstract

Traditional new event detection is first proposed by Topic Detection and Tracking and it is actually first event detection. However, one topic usually consists of many events. The automatic instant detection of each event in one topic, not only the first event but also the second, the third and so on, is very useful for users to correctly understand the main development trend of the topic. In this paper, we address the problem of new event detection in one single topic and propose a novel topic model to detect new events along with the topic evolution. Our topic model treats new event detection as novel semantic aspect identification in one topic, rather than measuring the analog degrees between content items by lexical congruence. Besides, it can automatically determine the appropriate number of aspects needed and can naturally adapt dynamic change in the vocabulary along with the topic evolution. We use a sequential Gibbs sampling algorithm for posterior inference, which well realizes the online new event detection. Experiments are presented to show the performance of our proposed technique. It is found that our proposed technique outperforms the comparable techniques in previous work.

Cite this article as:

Y. Xi, B. Li, and Y. Tang, “Topic Model Based New Event Detection Within Topics,” J. Adv. Comput. Intell. Intell. Inform., Vol.20 No.3, pp. 467-476, 2016.

Data files:

References

[1] J. Allan, Topic detection and tracking: event-based information organization, Springer, 2002.
[2] T. Brants, F. Chen, and A. Farahat, “A system for new event detection,” Proc. of the 26th Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, ACM, pp. 330-337, 2003.
[3] G. Kumaran and J. Allan, “Using names and topics for new event detection,” Proc. of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp. 121-128, 2005.
[4] Y. Yang, T. Pierce, and J. Carbonell, “A study of retrospective and on-line event detection,” Proc. of the 21st Annual Int. ACM SIGIR Conf. on Research and development in information retrieval, ACM, pp. 28-36, 1998.
[5] X. Guo, Y. Xiang, Q. Chen, et al., “LDA-based online topic detection using tensor factorization,” J. of Information Science, Vol.39, No.4, pp. 459-469, 2013.
[6] G. Luo, C. Tang, and P. S. Yu, “Resource-adaptive real-time new event detection,” Proc. of the 2007 ACM SIGMOD Int. Conf. on Management of data, ACM, pp. 497-508, 2007.
[7] Y. Hu, L. Bai, and W. Zhang, “Modeling and Analyzing Topic Evolution,” ACTA AUTOMATICA SINICA, Vol.38, No.10, pp. 1690-1697, 2012.
[8] Y. Hu, L. Bai, and W. Zhang, “OLDA-based method for online topic evolution in network public opinion analysis,” J. of National University of Defense Technology, Vol.34, No.1, pp. 150-154, 2012.
[9] J. H. Lau, N. Collier, and T. Baldwin, “On-line Trend Analysis with Topic Models: #Twitter Trends Detection Topic Model Online,” Proc. of the 24th Int. Conf. on Computational Linguistics, pp. 1519-1534, 2012.
[10] Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Beli, “Hierarchical Dirichlet processes,” J. of the American Statistical Association, Vol.101, No.476, pp. 1566-1581, 2006.
[11] M. Serizawa and I. Kobayashi, “Topic Tracking Based on Identifying Proper No.of the Latent Topics in Documents,” J. of Advanced Computational Intelligence and Intelligent Informatics (JACIII), Vol.16, No.5, pp. 611-618, 2012.
[12] L. Huang and L. Huang, “Optimized Event Storyline Generation based on Mixture-Event-Aspect Model,” Proc. of the 2013 Conf. on Empirical Methods in Natural Language Processing, pp. 726-735, 2013.
[13] S. Xu, S. Wang, and Y. Zhang, “Summarizing Complex Events: a Cross-modal Solution of Storylines Extraction and Reconstruction,” Proc. of the 2013 Conf. on Empirical Methods in Natural Language Processing, pp. 1281-1291, 2013.
[14] J. Li and S. Li, “Evolutionary Hierarchical Dirichlet Process for Timeline Summarization,” Proc. of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 556-560, 2013.
[15] A. Ahmed and E. P. Xing, “Timeline: A Dynamic Hierarchical Dirichlet Process Model for Recovering Birth/Death and Evolution of Topics in Text Stream,” Proc. of the 26th Int. Conf. on Uncertainty in Artificial Intelligence, 2010.
[16] S. Petrovi'c, M. Osborne, and V. Lavrenko, “Streaming first story detection with application to twitter,” Human Language Technologies: The 2010 Annual Conf. of the North American Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, pp. 181-189, 2010.
[17] X. Wang, F. Zhu, J. Jiang, and S. Li, “Real time event detection in twitter, Web-Age Information Management,” Springer Berlin Heidelberg, pp. 502-513, 2013.
[18] Soboroff and D. Harman, “Novelty detection: the trec experience,” Proc. of the Conf. on Human Language Technology and Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp. 105-112, 2005.
[19] Y. Zhang, J. Callan, and T. Minka, “Novelty and redundancy detection in adaptive filtering,” Proc. of the 25th Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, ACM, pp. 81-88, 2002.
[20] X. Li, L. Du, and Y. Shen, “Update Summarization via Graph-Based Sentence Ranking,” IEEE Trans. on Knowledge and Data Engineering, Vol.25, No.5, pp. 1162-1174, 2013.
[21] J. Li, S. Li, X. Wang, Y. Tian, and B. Chang, “Update Summarization Using a Multi-level Hierarchical Dirichlet Process Model,” Proc. of the 24th Int. Conf. on Computational Linguistics, pp. 1603-1618, 2012.
[22] D. M. Blei and J. D. Laerty, “Dynamic topic models,” ICML, pp. 113-120, 2006.
[23] The 2004 Topic Detection and Tracking (TDT2004) Task Definition and Evaluation Plan [H], version 1.2,
http://www. nist. gov.
[24] J. Allan, V. Lavrenko, D. Malin, and R. Swan, “Detections, bounds, and timelines: Umass and tdt-3,” Proc. of Topic Detection and Tracking Workshop, pp. 167-174, 2000.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[1] [1] J. Allan, Topic detection and tracking: event-based information organization, Springer, 2002.

[2] [2] T. Brants, F. Chen, and A. Farahat, “A system for new event detection,” Proc. of the 26th Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, ACM, pp. 330-337, 2003.

[3] [3] G. Kumaran and J. Allan, “Using names and topics for new event detection,” Proc. of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp. 121-128, 2005.

[4] [4] Y. Yang, T. Pierce, and J. Carbonell, “A study of retrospective and on-line event detection,” Proc. of the 21st Annual Int. ACM SIGIR Conf. on Research and development in information retrieval, ACM, pp. 28-36, 1998.

[5] [5] X. Guo, Y. Xiang, Q. Chen, et al., “LDA-based online topic detection using tensor factorization,” J. of Information Science, Vol.39, No.4, pp. 459-469, 2013.

[6] [6] G. Luo, C. Tang, and P. S. Yu, “Resource-adaptive real-time new event detection,” Proc. of the 2007 ACM SIGMOD Int. Conf. on Management of data, ACM, pp. 497-508, 2007.

[7] [7] Y. Hu, L. Bai, and W. Zhang, “Modeling and Analyzing Topic Evolution,” ACTA AUTOMATICA SINICA, Vol.38, No.10, pp. 1690-1697, 2012.

[8] [8] Y. Hu, L. Bai, and W. Zhang, “OLDA-based method for online topic evolution in network public opinion analysis,” J. of National University of Defense Technology, Vol.34, No.1, pp. 150-154, 2012.

[9] [9] J. H. Lau, N. Collier, and T. Baldwin, “On-line Trend Analysis with Topic Models: #Twitter Trends Detection Topic Model Online,” Proc. of the 24th Int. Conf. on Computational Linguistics, pp. 1519-1534, 2012.

[10] [10] Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Beli, “Hierarchical Dirichlet processes,” J. of the American Statistical Association, Vol.101, No.476, pp. 1566-1581, 2006.

[11] [11] M. Serizawa and I. Kobayashi, “Topic Tracking Based on Identifying Proper No.of the Latent Topics in Documents,” J. of Advanced Computational Intelligence and Intelligent Informatics (JACIII), Vol.16, No.5, pp. 611-618, 2012.

[12] [12] L. Huang and L. Huang, “Optimized Event Storyline Generation based on Mixture-Event-Aspect Model,” Proc. of the 2013 Conf. on Empirical Methods in Natural Language Processing, pp. 726-735, 2013.

[13] [13] S. Xu, S. Wang, and Y. Zhang, “Summarizing Complex Events: a Cross-modal Solution of Storylines Extraction and Reconstruction,” Proc. of the 2013 Conf. on Empirical Methods in Natural Language Processing, pp. 1281-1291, 2013.

[14] [14] J. Li and S. Li, “Evolutionary Hierarchical Dirichlet Process for Timeline Summarization,” Proc. of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 556-560, 2013.

[15] [15] A. Ahmed and E. P. Xing, “Timeline: A Dynamic Hierarchical Dirichlet Process Model for Recovering Birth/Death and Evolution of Topics in Text Stream,” Proc. of the 26th Int. Conf. on Uncertainty in Artificial Intelligence, 2010.

[16] [16] S. Petrovi'c, M. Osborne, and V. Lavrenko, “Streaming first story detection with application to twitter,” Human Language Technologies: The 2010 Annual Conf. of the North American Chapter of the Association for Computational Linguistics, Association for Computational Linguistics, pp. 181-189, 2010.

[17] [17] X. Wang, F. Zhu, J. Jiang, and S. Li, “Real time event detection in twitter, Web-Age Information Management,” Springer Berlin Heidelberg, pp. 502-513, 2013.

[18] [18] Soboroff and D. Harman, “Novelty detection: the trec experience,” Proc. of the Conf. on Human Language Technology and Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp. 105-112, 2005.

[19] [19] Y. Zhang, J. Callan, and T. Minka, “Novelty and redundancy detection in adaptive filtering,” Proc. of the 25th Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, ACM, pp. 81-88, 2002.

[20] [20] X. Li, L. Du, and Y. Shen, “Update Summarization via Graph-Based Sentence Ranking,” IEEE Trans. on Knowledge and Data Engineering, Vol.25, No.5, pp. 1162-1174, 2013.

[21] [21] J. Li, S. Li, X. Wang, Y. Tian, and B. Chang, “Update Summarization Using a Multi-level Hierarchical Dirichlet Process Model,” Proc. of the 24th Int. Conf. on Computational Linguistics, pp. 1603-1618, 2012.

[22] [22] D. M. Blei and J. D. Laerty, “Dynamic topic models,” ICML, pp. 113-120, 2006.

[23] [23] The 2004 Topic Detection and Tracking (TDT2004) Task Definition and Evaluation Plan [H], version 1.2,
http://www. nist. gov.

[24] [24] J. Allan, V. Lavrenko, D. Malin, and R. Swan, “Detections, bounds, and timelines: Umass and tdt-3,” Proc. of Topic Detection and Tracking Workshop, pp. 167-174, 2000.