
JACIII Vol.16 No.2 pp. 335-340
doi: 10.20965/jaciii.2012.p0335


Recognition of Emotions on the Basis of Different Levels of Speech Segments

Klára Vicsi and Dávid Sztahó

Department of Telecommunication and Media Informatics, Budapest University of Technology and Economics, 2 Magyar tudósok körútja, Budapest 1117, Hungary

September 15, 2011
November 15, 2011
March 20, 2012
emotion recognition, intonational phrase, support vector machines
Emotions play a very important role in human-human and human-machine communication. They can be expressed by voice, bodily gestures, and facial movements. People’s acceptance of any kind of intelligent device depends, to a large extent, on how the device reflects emotions. This is the reason why automatic emotion recognition is a recent research topic. In this paper we deal with automatic emotion recognition from human voice. Numerous papers in this field deal with database creation and with the examination of acoustic features appropriate for such recognition, but only few attempts were made to compare different emotional segmentation units that are needed to recognize the emotions in spontaneous speech properly. In the Laboratory of Speech Acoustics experiments were ran to examine the effect of diverse speech segment lengths on recognition performance. An emotional database was prepared on the basis of three different segmentation levels: word, intonational phrase and sentence. Automatic recognition tests were conducted using support vector machines with four basic emotions: neutral, anger, sadness, and joy. The analysis of the results clearly shows that intonation phrase-sized speech units give the best performance in emotional recognition in continuous speech.
Cite this article as:
K. Vicsi and D. Sztahó, “Recognition of Emotions on the Basis of Different Levels of Speech Segments,” J. Adv. Comput. Intell. Intell. Inform., Vol.16 No.2, pp. 335-340, 2012.
