Prediction of Emotions from Text using Sentiment Analysis for Expressive Speech Synthesis
Vanmassenhove,Eva ; Cabral,João P. ; Haider,Fasih
Vanmassenhove,Eva
Cabral,João P.
Haider,Fasih
Abstract
The generation of expressive speech is a great challenge for text-to-speech synthesis in audiobooks. One of the most important factors is the variation in speech emotion or voice style. In this work, we developed a method to predict the emotion from a sentence so that we can convey it through the synthetic voice. It consists of combining a standard emotion-lexicon based technique with the polarity-scores (positive/negative polarity) provided by a less fine-grained sentiment analysis tool, in order to get more accurate emotion-labels. The primary goal of this emotion prediction tool was to select the type of voice (one of the emotions or neutral) given the input sentence to a stateof- the-art HMM-based Text-to-Speech (TTS) system. In addition, we also combined the emotion prediction from text with a speech clustering method to select the utterances with emotion during the process of building the emotional corpus for the speech synthesizer. Speech clustering is a popular approach to divide the speech data into subsets associated with different voice styles. The challenge here is to determine the clusters that map out the basic emotions from an audiobook corpus that contains high variety of speaking styles, in a way that minimizes the need for human annotation. The evaluation of emotion classification from text showed that, in general, our system can obtain accuracy results close to that of human annotators. Results also indicate that this technique is useful in the selection of utterances with emotion for building expressive synthetic voices.
Description
Funding Information: This research is supported by the Science Foundation Ireland (Grant 13/RC/2106) as part of ADAPT (www.adaptcentre.ie) and EU FP7-METALOGUE project under Grant No. 611073, at Trinity College Dublin, and by the Dublin City University Faculty of Engineering & Computing under the Daniel O’Hare Research Scholarship scheme. Publisher Copyright: © 2016, 9th ISCA Speech Synthesis Workshop, SSW 2016. All rights reserved.
Date
2016
Journal Title
Journal ISSN
Volume Title
Publisher
Research Projects
Organizational Units
Journal Issue
Keywords
audiobooks, emotion, expressive speech synthesis, sentiment analysis, speech clustering
Citation
Vanmassenhove, E, Cabral, J P & Haider, F 2016, 'Prediction of Emotions from Text using Sentiment Analysis for Expressive Speech Synthesis', Paper presented at 9th ISCA Speech Synthesis Workshop, SSW 2016, Sunnyvale, United States, 13/09/16 - 15/09/16 pp. 21-26.
