100288: Whispered Speech Prosody Modeling for TTS Synthesis (abstract)

Whispered Speech Prosody Modeling for TTS Synthesis

Valery A. Petrushin, The Nielsen Company
Liliya I. Tsirulnik, United Institute of Informatics Problems of NAS of Belarus
Veronika Makarova, University of Saskatchewan

The paper is devoted to modeling prosody of whispered Russian speech. The practical purpose of this research is to extend voice cloning techniques to whispered speech modality. The authors present their analysis of prosodic features that contribute to the expression of sentence type intonation in whispered speech. The current investigation includes intonation contours in complete and incomplete declaratives, as well as in interrogatives and exclamations. Since the fundamental frequency is absent in whisper, the major role in conveying sentence type intonation is taken over by formant values. For modeling prosody of whispered speech an extension of the Accent Unit Portrait Model is proposed. The paper outlines how melodic, rhythmic and dynamic (energy) portraits of accent units can be built and employed for whispered speech modifications by a concatenative text-to-speech synthesizer.