Research on Pattern Representation Based on Keyword and Word Embedding in Chinese Entity Relation Extraction
Feiyue Ye and Zhentao Qin
School of Computer Engineering and Science, Shanghai University
99 Shangda Road, Baoshan District, Shanghai 200444, China
With the rapid development of the Internet, it is becoming more and more important to extract the relationship between the entity from the massive network text and then to build the knowledge graph or the knowledge base. In this paper, we focus on the research of the pattern representation in relation extraction, and extract the high accuracy Chinese entity pairs from large scale web texts. Past relation patterns only consider shallow lexical and syntax, not accurately and deeply express pattern context information, and do not consider keywords information. According to the new entity relation extraction technology and the characteristics of Chinese corpora, we define pattern representation based on keywords and word embedding information, extract deep semantic feature of context information, and strengthen keywords information effect for relation extraction. In addition, we propose a method for obtaining sentence keyword based on word embedding. In the experiment, we use Chinese Hudong Encyclopedia corpus to implement the character relation extraction system, and test the character relation extraction effect. The experimental results show that this method effectively improves the quality of the pattern, and obtains a favorable relation extraction performance.
-  M. A. Hearst, “Automatic acquisition of hyponyms from large text corpora,” Conf. on Computational Linguistics. pp. 539-545, 1992.
-  S. Brin, “Extracting Patterns and Relations from the World Wide Web,” The World Wide Web and Databases, Springer Berlin Heidelberg, pp. 172-183, 1998.
-  E. Agichtein and L. Gravano, “Snowball: extracting relations from large plain-text collections,” ACM Conf. on Digital Libraries, pp. 85-94, 2000.
-  P. Pantel and M. Pennacchiotti, “Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations,” Int. Conf. on Computational Linguistics and Meeting of the Association for Computational Linguistics, pp. 113-120, 2006.
-  Y. Bengio et al., “A neural probabilistic language model,” J. of Machine Learning Research, Vol.3, No.6, pp. 1137-1155, 2003.
-  P. S. Dhillon, D. Foster, and L. Ungar, “Multi-View Learning of Word Embeddings via CCA,” Proc. of Nips, pp. 199-207, 2011.
-  T. Mikolov et al., “Efficient Estimation of Word Representations in Vector Space,” Computer Science, 2013.
-  T. Liu and M. Li, “Improving relation descriptor extraction with word embeddings and cluster features,” IEEE Int. Conf. on Systems, Man and Cybernetics, pp. 1271-1275, 2014.
-  R. Fu et al., “Learning Semantic Hierarchies via Word Embeddings,” Meeting of the Association for Computational Linguistics, pp. 1199-1209, 2014.
-  R. Hoffmann et al., “Knowledge-based weak supervision for information extraction of overlapping relations,” Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 541-550, 2011.
-  I. Augenstein, “Seed Selection for Distantly Supervised Web-Based Relation Extraction,” The Workshop on Semantic Web & Information Extraction, pp. 17-24, 2014.
-  J. F. Jiang and S. X. Wang, “A Bootstrapping Method for Acquisition of Bi-relations and Bi-relational Patterns,” J. of Chinese Information Processing, 2005.
-  H. E. Tingting et al, “Named Entity Relation Extraction Method Based on Seed Self-expansion,” Computer Engineering, Vol.32, No.21, pp. 183-184, 2006.
-  I. Sarhan, Y. El-Sonbaty, and M. A. El-Nasr, “Semi-Supervised Pattern Based Algorithm for Arabic Relation Extraction.” Int. Conf. on TOOLS with Artificial Intelligence, pp. 177-183, 2017.
-  F. Ye and N. Tang, “Research on Pattern Representation and Reliability in Semi-Supervised Entity Relation Extraction,” Int. Conf. in Swarm Intelligence, Springer International Publishing, pp. 289-297, 2016.
-  R. Socher et al., “Semantic compositionality through recursive matrix-vector spaces,” Joint Conf. on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 1201-1211, 2012.
-  J. Liu et al., “Social relation extraction with improved distant supervised and word embedding features,” IEEE Int. Conf. on Big Data Analysis, pp. 1-5, 2016.
-  T. Mikolov et al., “Distributed Representations of Words and Phrases and their Compositionality,” Advances in Neural Information Processing Systems 26, pp. 3111-3119, 2013.
-  W. Ni, Y. Huang, F. Li, and S. Liu, “An Agglomerative Hierarchical Clustering Algorithm Based on Weighted Representative Points,” Computer Science, 2005.
-  M. Mintz et al., “Distant supervision for relation extraction without labeled data,” Joint Conf. of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th Int. Joint Conf. on Natural Language Processing of the Asian Federation of Natural Language Processing, pp. 1003-1011, 2009.
-  F. Ye, H. Shi, and S. Wu, “Research on Pattern Representation Method in Semi-supervised Semantic Relation Extraction Based on Bootstrapping,” 7th Int. Symp. on Computational Intelligence and Design, pp. 568-572, 2014.
-  B. Deng, X. Fan, and Yang L, “Entity Relation Extraction Method Using Semantic Pattern,” Computer Engineering, Vol.33, No.10, pp. 212-214, 2007.