Paper:
Empirical Research of Hot Topic Recognition and its Evolution Path Method for Scientific and Technological Literature
Lei Jiang*, Tao Zhang**,, and Taihua Huang**
*Information and Network Center, Heilongjiang University
Harbin, Heilongjiang 150080, China
**School of Information Management, Heilongjiang University
Harbin, Heilongjiang 150080, China
Corresponding author
With the advent of big data era, the recognition of hot topics and the analysis of their evolution path in the frontier of a certain field of scientific and technological literature have received widespread attention from the academic community. It can not only reveal the development trend in a certain field of scientific and technological literature, but also discover the evolution law of topic content in different development stages of the field. However, there are still some problems in some current research methods, such as inaccurate recognition of hot topics and unclear evolution path, which seriously affect the comprehensiveness and accuracy of the analysis. To solve the above problems, this paper uses Latent Dirichlet Allocation (LDA) model to propose a hot topic recognition and evolution analysis method in scientific and technological literature field, which aims to reveal the evolution law of topic content level in different development stages of the field, such as inheritance, merging, division, and other topic evolution trends, so as to provide decision support for domain knowledge innovation services. Main research process is as follows. Firstly, LDA is used to extract global topics and stage topics. Secondly, similarity calculation algorithm is used to filter topics. Thirdly, novelty and support are used to identify hot topics. Fourthly, three paths of inheritance evolution, merging evolution and division evolution are formed for hot topics. Finally, the effectiveness of the method is verified by using 47,896 scientific and technological literature data in the field of intelligent algorithms in Web of Science as an empirical example.
- [1] M. Tang, H. Liao, and S.-F. Su, “A Bibliometric Overview and Visualization of the International Journal of Fuzzy Systems Between 2007 and 2017,” Int. J. of Fuzzy Systems, Vol.20, pp. 1403-1422, 2018.
- [2] G.-Y. Shi, Y.-X. Kong, G.-H. Yuan, R.-J. Wu, A. Zeng, and M. Medo, “Discoverers in Scientific Citation Data,” J. of Informetrics, Vol.13, Issue 2, pp. 717-725, 2019.
- [3] K. Hu, Q. Luo, K. Qi, S. Yang, J. Mao, X. Fu, J. Zheng, H. Wu, Y. Guo, and Q. Zhu, “Understanding the Topic Evolution of Scientific Literatures Like an Evolving City: Using Google Word2vec Model and Spatial Autocorrelation Analysis,” Information Processing & Management, Vol.56, Issue 4, pp. 1185-1203, 2019.
- [4] J. Ruiz-Rosero, G. Ramirez-Gonzalez, and J. Viveros-Delgado, “Software Survey: ScientoPy, a Scientometric Tool for Topics Trend Analysis in Scientific Publications,” Scientometrics, Vol.121, pp. 1165-1188, 2019.
- [5] Y. Bai, H. Li, and Y. Liu, “Visualizing Research Trends and Research Theme Evolution in E-Learning Field: 1999–2018,” Scientometrics, Vol.126, pp. 1389-1414, 2021.
- [6] S. Deng, S. Xia, J. Hu, H. Li, and Y. Liu, “Exploring the Topic Structure and Evolution of Associations in Information Behavior Research Through Co-Word Analysis,” J. of Librarianship and Information Science, Vol.53, Issue 2, pp. 280-297, 2021.
- [7] Y. Xu, S. Zhang, W. Zhang, S. Yang, and Y. Shen, “Research Front Detection and Topic Evolution Based on Topological Structure and the PageRank Algorithm,” Symmetry, Vol.11, Issue 3, Article No.310, 2019.
- [8] W. Gaul and D. Vincent, “Evaluation of the Evolution of Relationships Between Topics Over Time,” Advances in Data Analysis and Classification, Vol.11, pp. 159-178, 2017.
- [9] J. Wang, X. Wu, and L. Li, “A Framework for Semantic Connection Based Topic Evolution with Deepwalk,” Intelligent Data Analysis, Vol.22, pp. 211-237, 2018.
- [10] C. L. González-Valiente, R. Costas, E. Noyons, J. Steinerová, and J. Šušol, “Terminological (di) Similarities Between Information Management and Knowledge Management: A Term Co-Occurrence Analysis,” Mobile Networks and Applications, Vol.26, pp. 336-346, 2021.
- [11] L. Wang, L. La, and Z. H. Wang, “A Three Stage Method for Inter-topic Correlation Analysis in Social Networks,” J. of Nonlinear and Convex Analysis, Vol.20, No.7, pp. 1353-1364, 2019.
- [12] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet Allocation,” J. of Machine Learning Research, No.3, pp. 993-1022, 2003.
- [13] X. Wang and A. McCallum, “Topics Over Time: A Non-Markov Continuous-Time Model of Topical Trends,” Proc. of the 12th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 424-433, 2006.
- [14] T. L. Griffiths and M. Steyvers, “Finding scientific topics,” Proc. of the National Academy of Science, Vol.101, No.Suppl_1, pp. 5228-5235, 2004.
- [15] D. M. Blei and J. D. Lafferty, “Dynamic Topic Models,” Proc. of the 23rd Int. Conf. on Machine Learning, pp. 113-120, 2006.
- [16] M. Zamani, H. A. Schwartz, J. Eichstaedt, S. C. Guntuku, and S. Giorgi, “Understanding Weekly COVID-19 Concerns Through Dynamic Content-Specific LDA Topic Modeling,” Proc. of the 4th Workshop on Natural Language Processing and Computational Social Science, pp. 193-198, 2020.
- [17] C. Tan and M. Xiong, “Contrastive Analysis in China and Abroad on the Evolution of Hot Topics in the Field of Digital Library Based on LDA Model,” Data Science and Informetrics, Vol.1, No.2, pp. 110-130, 2021.
- [18] J. Heaton, “Artificial Intelligence for Humans, Volume 3: Deep Learning and Neural Networks,” Heaton Research, 2015.
- [19] B. Mahesh, “Machine Learning Algorithms – A Review,” Int. J. of Science and Research, Vol.9, Issue 1, pp. 381-386, 2020.
This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.