Automatic Extraction of Key Sentences via Word Sense Identification for Chinese Text Summarization

Yau-Hwang Kuo; Hsun-Hui Huang

doi:10.20965/jaciii.2007.p0416

single-jc.php

« previous

JACIII Vol.11 No.4 pp. 416-422

(2007)

doi: 10.20965/jaciii.2007.p0416

Paper:

Views over last 60 days: 749

Automatic Extraction of Key Sentences via Word Sense Identification for Chinese Text Summarization

Yau-Hwang Kuo and Hsun-Hui Huang

CREDIT, Department of Computer Science and Information Engineering, National Cheng Kung University, No.1, Ta-Hsueh Rd., Tainan, Taiwan

Received:

April 30, 2006

Accepted:

August 16, 2006

Published:

April 20, 2007

Keywords:

key sentences, text summarization, word sense disambiguation, sense representation, fuzzy transaction

Abstract

In this paper, a novel method of key sentences extraction is proposed for automatic Chinese text summarization. Key-senses/sense-patterns discovery and key sentences extraction are its two main components. Since there is no Chinese lexical database like WordNet available to the authors, a compromise is to word-segment, POS-tag a target Chinese text and translate all the nouns/verbs into English for sense disambiguation using WordNet. The characteristic of the proposed method is that each sentence is represented by senses and the key senses in each sentence form a fuzzy transaction. Each entry of the fuzzy transaction is the maximum similarity degree of the corresponding key sense with each of the senses in the sentence. A prototype of this automatic Chinese text summarization scheme is constructed and an intrinsic method with the information-retrieval criteria is used for measuring the summary quality. The results of applying the prototype to datasets with manually-generated summaries are shown.

Cite this article as:

Y. Kuo and H. Huang, “Automatic Extraction of Key Sentences via Word Sense Identification for Chinese Text Summarization,” J. Adv. Comput. Intell. Intell. Inform., Vol.11 No.4, pp. 416-422, 2007.

Data files:

References

[1] S. Banerjee and T. Pedersen, “An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet,” In the 3rd International Conference on Intelligent Text Processing and Computational Linguistics, pp. 136-145, Mexico, 2002.
[2] R. Barzilay and M. Elhadad, “Using Lexical Chains for Text Summarization,” In the Intelligent Scalable Text Summarization Workshop, Madrid, 1997.
[3] H. H. Chen, C. C. Lin, and W. C. Lin, “Building a Chinese-English WordNet for Translingual Applications,” ACM Transactions on Asian Language Information Processing, 1(2): pp. 103-122, June, 2002.
[4] W. Chuang and J. Yang, “Extracting Sentence Segments for Text Summarization: A Machine Learning Approach,” In the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 152-159, Athens, Greece, July, 2000.
[5] CKIP, “Chinese Electronic Dictionary,” Academia Sinica, Taiwan, 1993.
[6] CKIP, “Chinese Part-of-speech Analysis,” Technical Report 93-05, Academia Sinica, Taiwan, 1993.
[7] CKIP, “AutoTag Version 1.0,” Academia Sinica, Taiwan.
http://ckipsvr.iis.sinica.edu.tw/ ,
1999.
[8] R. Clason, “Finding Clusters: An Application of the Distance Concept,” The Mathematics Teacher, April 1990.
[9] M. Delgado, N. Marín, D. Sánchez, and M.-A. Vila, “Fuzzy Association Rules: General Model and Applications,” IEEE Transactions on Fuzzy Systems, 11(2): pp. 214-225, April, 2003.
[10] Z. Dong and Q. Dong. “HowNet [online] 2000,”
http://www.keenage.com/zhiwang/e_zhiwang.html ,
2000.
[11] H. Edmundson, “New Methods in Automatic Abstracting,” Journal of ACM, 16(2): pp. 264-285, 1969.
[12] C. Fellbaum (Ed.), “WordNet: An Electronic Lexical Database,” MIT Press, 1998.
[13] J. Goldstein, V. Mittal, J. Carbonell, and J. Callan, “Creating and Evaluating Multi-document Sentence Extract Summaries,” In the 9th ACM International Conference on Information and Knowledge Management, pp. 165-172, McLean, VA, USA, November, 2000.
[14] U. Hahn and I. Mani, “The Challenges of Automatic Summarization,” IEEE Computer, 33(11): pp. 29-36, November, 2000.
[15] P. Hu, T. He, D. Ji, and M. Wang, “A Study of Chinese Text Summarization Using Adaptive Clustering of Paragraphs,” In the 4th International Conference on Computer and Information Technology (CIT’04), pp. 1159-1164, August, 2004.
[16] K. Ishikawa, S. Ando, S. Doi, and A. Okumura, “Trainable Automatic Text Summarization Using Segmentation of Sentence,” In the 3rd NTCIR Workshop on Research in Information Retrieval, Automatic Text Summarization and Question Answering, September 2001-October 2002.
[17] J. J. Jiang and D. W. Conrath, “Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy,” In the 10th International Conference on Research Computational Linguistics (ROCLING X), Taiwan, 1997.
[18] J. Kupiec, J. Pedersen, and F. Chen, “A Trainable Document Summarizer,” In the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 68-73, Seattle, WA, USA, July, 1995.
[19] J. J. Li and K. S. Choi, “Corpus-Based Chinese Text Summarization System,” In the 10th International Conference on Research Computational Linguistics (ROCLING X), pp. 237-241, Taiwan, 1997.
[20] I. Mani, G. Klein, D. House, L. Hirschman, T. Firmin, and B. Sundheim, “SUMMAC: A Text Summarization Evaluation,” Natural Language Engineering, 8(1): pp. 43-68, 2002.
[21] J. Mei, Y. Zhu, Y. Gao, and H. Yin (Eds.), “Tongyici Cilin,” Shangwu Press and Shanghai Dictionaries, Shanghai, 1983.
[22] C. D. Paice, “Constructing Literature Abstracts by Computer: Techniques and Prospects,” Information Processing and Management, 26(1): pp. 171-186, 1990.
[23] S. Patwardhan, S. Banerjee, and T. Pedersen, “Using Measures of Semantic Relatedness for Word Sense Disambiguation,” In the 4th International Conference on Intelligent Text Processing and Computational Linguistics, Mexico, 2003.
[24] M. Sussna, “Word Sense Disambiguation for Free-text Indexing Using a Massive Semantic Network,” In the 2nd International Conference on Information and Knowledge Management (CIKM), Arlington, Virginia, USA, 1993.
[25] Z. Xie, X. Li, B. Di, Eugenio, W. Xiao, T. Tirpak, and P. Nelson, “Using Gene Expression Programming to Construct Sentence Ranking Functions for Text Summarization,” In the 20th International Conference on Computational Linguistics, COLING-2004, pp. 1381-1384, Geneva, August, 2004.
[26] J. Y. Yeh, H. R. Ke, and W. P. Yang, “Chinese Text Summarization Using a Trainable Summarizer and Latent Semantic Analysis,” In the 5th International Conference on Asian Digital Libraries: Digital Libraries: People, Knowledge, and Technology, pp. 76-87, 2002.

This article is published under a Creative Commons Attribution-NoDerivatives 4.0 Internationa License.

[B1] [1] S. Banerjee and T. Pedersen, “An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet,” In the 3rd International Conference on Intelligent Text Processing and Computational Linguistics, pp. 136-145, Mexico, 2002.

[B2] [2] R. Barzilay and M. Elhadad, “Using Lexical Chains for Text Summarization,” In the Intelligent Scalable Text Summarization Workshop, Madrid, 1997.

[B3] [3] H. H. Chen, C. C. Lin, and W. C. Lin, “Building a Chinese-English WordNet for Translingual Applications,” ACM Transactions on Asian Language Information Processing, 1(2): pp. 103-122, June, 2002.

[B4] [4] W. Chuang and J. Yang, “Extracting Sentence Segments for Text Summarization: A Machine Learning Approach,” In the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 152-159, Athens, Greece, July, 2000.

[B5] [5] CKIP, “Chinese Electronic Dictionary,” Academia Sinica, Taiwan, 1993.

[B6] [6] CKIP, “Chinese Part-of-speech Analysis,” Technical Report 93-05, Academia Sinica, Taiwan, 1993.

[B7] [7] CKIP, “AutoTag Version 1.0,” Academia Sinica, Taiwan.
http://ckipsvr.iis.sinica.edu.tw/ ,
1999.

[B8] [8] R. Clason, “Finding Clusters: An Application of the Distance Concept,” The Mathematics Teacher, April 1990.

[B9] [9] M. Delgado, N. Marín, D. Sánchez, and M.-A. Vila, “Fuzzy Association Rules: General Model and Applications,” IEEE Transactions on Fuzzy Systems, 11(2): pp. 214-225, April, 2003.

[B10] [10] Z. Dong and Q. Dong. “HowNet [online] 2000,”
http://www.keenage.com/zhiwang/e_zhiwang.html ,
2000.

[B11] [11] H. Edmundson, “New Methods in Automatic Abstracting,” Journal of ACM, 16(2): pp. 264-285, 1969.

[B12] [12] C. Fellbaum (Ed.), “WordNet: An Electronic Lexical Database,” MIT Press, 1998.

[B13] [13] J. Goldstein, V. Mittal, J. Carbonell, and J. Callan, “Creating and Evaluating Multi-document Sentence Extract Summaries,” In the 9th ACM International Conference on Information and Knowledge Management, pp. 165-172, McLean, VA, USA, November, 2000.

[B14] [14] U. Hahn and I. Mani, “The Challenges of Automatic Summarization,” IEEE Computer, 33(11): pp. 29-36, November, 2000.

[B15] [15] P. Hu, T. He, D. Ji, and M. Wang, “A Study of Chinese Text Summarization Using Adaptive Clustering of Paragraphs,” In the 4th International Conference on Computer and Information Technology (CIT’04), pp. 1159-1164, August, 2004.

[B16] [16] K. Ishikawa, S. Ando, S. Doi, and A. Okumura, “Trainable Automatic Text Summarization Using Segmentation of Sentence,” In the 3rd NTCIR Workshop on Research in Information Retrieval, Automatic Text Summarization and Question Answering, September 2001-October 2002.

[B17] [17] J. J. Jiang and D. W. Conrath, “Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy,” In the 10th International Conference on Research Computational Linguistics (ROCLING X), Taiwan, 1997.

[B18] [18] J. Kupiec, J. Pedersen, and F. Chen, “A Trainable Document Summarizer,” In the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 68-73, Seattle, WA, USA, July, 1995.

[B19] [19] J. J. Li and K. S. Choi, “Corpus-Based Chinese Text Summarization System,” In the 10th International Conference on Research Computational Linguistics (ROCLING X), pp. 237-241, Taiwan, 1997.

[B20] [20] I. Mani, G. Klein, D. House, L. Hirschman, T. Firmin, and B. Sundheim, “SUMMAC: A Text Summarization Evaluation,” Natural Language Engineering, 8(1): pp. 43-68, 2002.

[B21] [21] J. Mei, Y. Zhu, Y. Gao, and H. Yin (Eds.), “Tongyici Cilin,” Shangwu Press and Shanghai Dictionaries, Shanghai, 1983.

[B22] [22] C. D. Paice, “Constructing Literature Abstracts by Computer: Techniques and Prospects,” Information Processing and Management, 26(1): pp. 171-186, 1990.

[B23] [23] S. Patwardhan, S. Banerjee, and T. Pedersen, “Using Measures of Semantic Relatedness for Word Sense Disambiguation,” In the 4th International Conference on Intelligent Text Processing and Computational Linguistics, Mexico, 2003.

[B24] [24] M. Sussna, “Word Sense Disambiguation for Free-text Indexing Using a Massive Semantic Network,” In the 2nd International Conference on Information and Knowledge Management (CIKM), Arlington, Virginia, USA, 1993.

[B25] [25] Z. Xie, X. Li, B. Di, Eugenio, W. Xiao, T. Tirpak, and P. Nelson, “Using Gene Expression Programming to Construct Sentence Ranking Functions for Text Summarization,” In the 20th International Conference on Computational Linguistics, COLING-2004, pp. 1381-1384, Geneva, August, 2004.

[B26] [26] J. Y. Yeh, H. R. Ke, and W. P. Yang, “Chinese Text Summarization Using a Trainable Summarizer and Latent Semantic Analysis,” In the 5th International Conference on Asian Digital Libraries: Digital Libraries: People, Knowledge, and Technology, pp. 76-87, 2002.