Automatic Extraction of Key Sentences via Word Sense Identification for Chinese Text Summarization
Yau-Hwang Kuo and Hsun-Hui Huang
CREDIT, Department of Computer Science and Information Engineering, National Cheng Kung University, No.1, Ta-Hsueh Rd., Tainan, Taiwan
In this paper, a novel method of key sentences extraction is proposed for automatic Chinese text summarization. Key-senses/sense-patterns discovery and key sentences extraction are its two main components. Since there is no Chinese lexical database like WordNet available to the authors, a compromise is to word-segment, POS-tag a target Chinese text and translate all the nouns/verbs into English for sense disambiguation using WordNet. The characteristic of the proposed method is that each sentence is represented by senses and the key senses in each sentence form a fuzzy transaction. Each entry of the fuzzy transaction is the maximum similarity degree of the corresponding key sense with each of the senses in the sentence. A prototype of this automatic Chinese text summarization scheme is constructed and an intrinsic method with the information-retrieval criteria is used for measuring the summary quality. The results of applying the prototype to datasets with manually-generated summaries are shown.
-  S. Banerjee and T. Pedersen, “An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet,” In the 3rd International Conference on Intelligent Text Processing and Computational Linguistics, pp. 136-145, Mexico, 2002.
-  R. Barzilay and M. Elhadad, “Using Lexical Chains for Text Summarization,” In the Intelligent Scalable Text Summarization Workshop, Madrid, 1997.
-  H. H. Chen, C. C. Lin, and W. C. Lin, “Building a Chinese-English WordNet for Translingual Applications,” ACM Transactions on Asian Language Information Processing, 1(2): pp. 103-122, June, 2002.
-  W. Chuang and J. Yang, “Extracting Sentence Segments for Text Summarization: A Machine Learning Approach,” In the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 152-159, Athens, Greece, July, 2000.
-  CKIP, “Chinese Electronic Dictionary,” Academia Sinica, Taiwan, 1993.
-  CKIP, “Chinese Part-of-speech Analysis,” Technical Report 93-05, Academia Sinica, Taiwan, 1993.
-  CKIP, “AutoTag Version 1.0,” Academia Sinica, Taiwan.
-  R. Clason, “Finding Clusters: An Application of the Distance Concept,” The Mathematics Teacher, April 1990.
-  M. Delgado, N. Marín, D. Sánchez, and M.-A. Vila, “Fuzzy Association Rules: General Model and Applications,” IEEE Transactions on Fuzzy Systems, 11(2): pp. 214-225, April, 2003.
-  Z. Dong and Q. Dong. “HowNet [online] 2000,”
-  H. Edmundson, “New Methods in Automatic Abstracting,” Journal of ACM, 16(2): pp. 264-285, 1969.
-  C. Fellbaum (Ed.), “WordNet: An Electronic Lexical Database,” MIT Press, 1998.
-  J. Goldstein, V. Mittal, J. Carbonell, and J. Callan, “Creating and Evaluating Multi-document Sentence Extract Summaries,” In the 9th ACM International Conference on Information and Knowledge Management, pp. 165-172, McLean, VA, USA, November, 2000.
-  U. Hahn and I. Mani, “The Challenges of Automatic Summarization,” IEEE Computer, 33(11): pp. 29-36, November, 2000.
-  P. Hu, T. He, D. Ji, and M. Wang, “A Study of Chinese Text Summarization Using Adaptive Clustering of Paragraphs,” In the 4th International Conference on Computer and Information Technology (CIT’04), pp. 1159-1164, August, 2004.
-  K. Ishikawa, S. Ando, S. Doi, and A. Okumura, “Trainable Automatic Text Summarization Using Segmentation of Sentence,” In the 3rd NTCIR Workshop on Research in Information Retrieval, Automatic Text Summarization and Question Answering, September 2001-October 2002.
-  J. J. Jiang and D. W. Conrath, “Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy,” In the 10th International Conference on Research Computational Linguistics (ROCLING X), Taiwan, 1997.
-  J. Kupiec, J. Pedersen, and F. Chen, “A Trainable Document Summarizer,” In the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 68-73, Seattle, WA, USA, July, 1995.
-  J. J. Li and K. S. Choi, “Corpus-Based Chinese Text Summarization System,” In the 10th International Conference on Research Computational Linguistics (ROCLING X), pp. 237-241, Taiwan, 1997.
-  I. Mani, G. Klein, D. House, L. Hirschman, T. Firmin, and B. Sundheim, “SUMMAC: A Text Summarization Evaluation,” Natural Language Engineering, 8(1): pp. 43-68, 2002.
-  J. Mei, Y. Zhu, Y. Gao, and H. Yin (Eds.), “Tongyici Cilin,” Shangwu Press and Shanghai Dictionaries, Shanghai, 1983.
-  C. D. Paice, “Constructing Literature Abstracts by Computer: Techniques and Prospects,” Information Processing and Management, 26(1): pp. 171-186, 1990.
-  S. Patwardhan, S. Banerjee, and T. Pedersen, “Using Measures of Semantic Relatedness for Word Sense Disambiguation,” In the 4th International Conference on Intelligent Text Processing and Computational Linguistics, Mexico, 2003.
-  M. Sussna, “Word Sense Disambiguation for Free-text Indexing Using a Massive Semantic Network,” In the 2nd International Conference on Information and Knowledge Management (CIKM), Arlington, Virginia, USA, 1993.
-  Z. Xie, X. Li, B. Di, Eugenio, W. Xiao, T. Tirpak, and P. Nelson, “Using Gene Expression Programming to Construct Sentence Ranking Functions for Text Summarization,” In the 20th International Conference on Computational Linguistics, COLING-2004, pp. 1381-1384, Geneva, August, 2004.
-  J. Y. Yeh, H. R. Ke, and W. P. Yang, “Chinese Text Summarization Using a Trainable Summarizer and Latent Semantic Analysis,” In the 5th International Conference on Asian Digital Libraries: Digital Libraries: People, Knowledge, and Technology, pp. 76-87, 2002.