JACIII Vol.13 No.4 pp. 499-505
doi: 10.20965/jaciii.2009.p0499


Text-Style Conversion of Speech Transcript into Web Document for Lecture Archive

Masashi Ito*, Tomohiro Ohno**, and Shigeki Matsubara***

*Graduate School of Information Science, Nagoya University

**Graduate School of International Development, Nagoya University

***Information Technology Center, Nagoya University, Furo-cho, Chikusa-ku, Nagoya 464-8601, Japan

November 25, 2008
March 25, 2009
July 20, 2009
natural languages, spoken language processing, digital archiving, web contents, paraphrasing

It is very significant to the knowledge society to accumulate spoken documents on the web. However, because of the high redundancy of spontaneous speech, the faithfully transcribed text is not readable on an Internet browser, and therefore not suitable as a web document. This paper proposes a technique for converting spoken documents into web documents for the purpose of building a speech archiving system. The technique edits automatically transcribed texts and improves their readability on the browser. The readable text can be generated by applying technology such as paraphrasing, segmentation, and structuring transcribed texts. Editing experiments using lecture data demonstrated the feasibility of the technique. A prototype system of spoken document archiving was implemented to confirm its effectiveness.

M. Ito, T. Ohno, and S. Matsubara, “Text-Style Conversion of Speech Transcript into Web Document for Lecture Archive,” J. Adv. Comput. Intell. Intell. Inform., Vol.13, No.4, pp. 499-505, 2009.
