Quality and Usability

Incorporating aspects of social perception in synthetic speech

A rapidly growing interest in human-computer interactions and spoken dialog systems has been observed in the last couple of years. Chatbots are becoming increasingly predominant, especially for customer service and personal companions and assistants. Main efforts are being undertaken on automatic speech recognition (ASR) and natural language understanding (NLU), facing challenges such as background noise in rooms, overlapping speech, and understanding context.

Besides recognizing spoken sentences and interpreting users’ intentions, the dialog strategy should pursue a satisfactory natural and assistive communication with users. With the need of providing personalized, tailored solutions based on users' individual behavior and preferences, adaptive voice-based interactions are today’s focus of numerous applications in academia and in industry.

Particularly, it would be desired that voice-based agents express a warm, agreeable, and considerate attitude in the healthcare industry, while for effective question-answering systems it is necessary the manifestation of confidence, certainty, and trust though the generated voice to engage users increasing credibility, acceptance and overall user satisfaction.

The aim of this project is to investigate the possibilities of transmitting positive or negative impressions of voice-driven assistant’s character and personality though voice synthesis. In other words, the transfer of synthetic voices into appealing and confident and evaluating the users' perceptions and acceptability would be the goal of this project. 

In view of the work motivation, the following research questions can be defined:

1. Identification of user’s positive and negative attributions of synthetic voices. What speaker social attributions elicit synthetic voices? How can these attributions be assessed?

2. Definition of acoustic correlates of attributions of synthetic voices. What acoustic speech parameters contribute to subjective synthetic voice attributions? How can these voice attributions be predicted?

3. Transformation of voice attributions into negative and positive ones. How can natural speech be altered for different perceptions of voice attributions? What methods achieve the transformation of synthetic speech for different perceptions of voice attributions?


Time Frame: 06/2019 - 08/2022

Team Members: Sai Sirisha Rallabandi

Funding by: Deutsche Forschungsgemeinschaft (DFG) MO 1038/29-1