Chinese Person Name Disambiguation Based on Two-Stage Clustering
Jie Zhou*, Bicheng Li**, and Yongwang Tang*
*Zhengzhou Information Science and Technology Institute
Zhengzhou 450002, China
**Computer Science and Technology Institute, Huaqiao University
Xiamen 361021, China
Person name clustering disambiguation is the process that partitions name mentions according to corresponding target person entities in reality. The existed methods can not realize effective identification of important features to disambiguate person names. This paper presents a method of Chinese person name disambiguation based on two-stage clustering. This method adopts a stage-by-stage processing model to identify and utilize different types of important features. Firstly, we extract three kinds of core evidences namely direct social relation, indirect social relation and common description prefix, recognize document-pairs referring to the same person entity, and realize initial clustering of person names with high precision. Then, we take the result of initial clustering as new initial input, utilize the statistical properties of multi-documents to recognize and evaluate important features, and build a double-vector representation of clusters (cluster feature vector and important feature vector). Based on the processes above, the final clustering of person names is generated, and the recall of clustering is improved effectively. The experiments have been conducted on the dataset of CLP2010 Chinese person names disambiguation, and experimental results show that this method has good performance in person name clustering disambiguation.
-  Most Common Male First Names in the U.S., http://names. mongabay.com/male_names.htm [Accessed Nov. 1, 2015].
-  J. Artiles, J. Gonzalo, and S. Sekine, “The SemEval-2007 WePS Evaluation: Establishing a benchmark for the Web People Search Task,” Proc. of the 4th Int. Workshop on Semantic Evaluations (SemEval-2007), pp. 64-69, 2007.
-  J. Artiles, J. Gonzalo, and S. Sekine, “WePS 2 Evaluation Campaign: Overview of the Web People Search Clustering Task,” Proc. of WWW Web People Search Evaluation Workshop, 2009.
-  J. Artiles, A. Borthwick, J. Gonzalo, et al., “WePS-3 Evaluation Campaign: Overview of the Web People Search Clustering and Attribute Extraction Tasks,” Proc. of 3rd Web People Search Evaluation Forum (WePS-3), CLEF, 2010.
-  Y. Chen, P. Jin, W. Li, and C. R. Huang, “The Chinese Persons Name Disambiguation Evaluation: Exploration of Personal Name Disambiguation in Chinese News,” Proc. of CIPS-SIGHAN Joint Conf. on Chinese Language Processing, pp. 20-26, 2010.
-  Z. Y. He, H. F. Wang, and S. J. Li, “The Task 2 of CIPS-SIGHAN 2012 Named Entity Recognition and Disambiguation in Chinese Bakeoff,” Proc. of CIPS-SIGHAN Joint Conf. on Chinese Language Processing, pp. 108-114, 2012.
-  E. Elmacioglu, Y. F. Tan, S. Yan, et al., “PSNUS: Web People Name Disambiguation by Simple Clustering with Rich Features,” Proc. of the 4th Int. Workshop on Semantic Evaluations (SemEval-2007), pp. 268-271, 2007.
-  J. T. Tang, Q. Lu, T. Wang, et al., “A Bipartite Graph Based Social Network Splicing Method for Person Name Disambiguation,” Proc. of the 34th Annual ACM SIGIR Conf., pp. 1233-1234, 2011.
-  O. Popescu and B. Magnini, “IRST-BP: Web People Search Using Name Entities,” Proc. of the 4th Int. Workshop on Semantic Evaluations (SemEval-2007), pp. 195-198, 2007.
-  L. Chong and S. Lei, “Web Person Name Disambiguation by Relevance Weighting of Extended Feature Sets,” Proc. of CLEF (Notebook Papers/LABs/Workshops), pp. 1-13, 2010.
-  L. W. Chen, Y. S. Feng, L. Zou, et al., “Explore Person Specific Evidence in Web Person Name Disambiguation,” Proc. of the 2012 Joint Conf. on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 832-842, 2012.
-  F. F. Zhang, Z. H. Li, X. H. Zhou, et al., “Cross-document Chinese Personal Name Entity Disambiguation based on Hierarchical Clustering,” Computer Engineering and Applications, Vol.50, No.6, pp. 106-111, 2014 (in Chinese).
-  G. Y. Li and H. F. Wang, “Chinese Named Entity Recognition and Disambiguation based on Multi-stage Clustering,” J. of Chinese Information Processing, Vol.27, No.5, pp. 29-34, 2013 (in Chinese).
-  Y. Chen, S. Y. M. Lee, and C. R. Huang, “PolyUHK: A Robust Information Extraction System for Web Personal Names,” Proc. of the 2nd Web People Search Evaluation Workshop (WePS 2009), 18th WWW Conf., 2009.
-  Z. Z. Liu, Q. Lu, and J. Xu, “High Performance Clustering for Web Person Name Disambiguation using Topic Capturing,” Proc. of Int. Workshop on Entity-Oriented Search (EOS), 2011.
-  M. Ikeda, S. Ono, I. Sato, et al., “Person Name Disambiguation on the Web by Two-Stage Clustering,” Proc. of the 2nd Web People Search Evaluation Workshop (WePS 2009) at WWW-2009, 2009.
-  M. Yoshida, M. Ikeda, S. Ono, et al., “Person Name Disambiguation by Bootstrapping,” Proc. of SIGIR, pp. 19-23, 2010.
-  NLPIR. http://ictclas.nlpir.org/ [Accessed May 10, 2016].
-  D. L. Wang and D. Huang, “Chinese Personal Name Disambiguation with Rich Features,” Proc. of the 2nd CIPS-SIGHAN Joint Conf. on Chinese Language Processing, 2010.
-  C. Chen and H. F. Wang, “Social Network based Cross-document Personal Name Disambiguation,” J. of Chinese Information Processing, Vol.25, No.5, pp. 75-82, 2011 (in Chinese).
-  Y. L. Yang and B. C. Li, “Research of Key Technologies for Names Disambiguation,” Zhengzhou, China, Zhengzhou Information Science and Technology Institute, 2015 (in Chinese).