Quality and Usability

Simulating Conversations for the Prediction of Speech Quality (Thilo Michael)

The measurement and prediction of speech quality are crucial planning tools for \acf{VoIP} communication providers. Current instrumental models that predict the quality of speech in a conversation scenario mainly rely on parameters of the transmission system for their prediction. However, for some degradations, it has been shown that the impact on the conversation, and thus the perceived quality, cannot be modeled by the parameters of the transmission alone. The effect of transmission delay on a telephone conversation depends on conversational interactivity, as the delayed speech signal slows down the turn-taking of the conversation partners. The impact of packet loss, while being audible in a listening situation, is also dependent on the part of transmitted information that is lost and, thus, whether the conversation partner needs to resolve a misunderstanding with additional repairing dialogue. In conversations where these impairments co-occur, interactivity effects may arise, as the meta-communication due to lost packets is, in turn, affected by transmission delay. As current instrumental quality prediction models do not consider these factors and their interaction, they cannot account for them. This thesis introduces conversation simulation as a new approach to the instrumental prediction of conversational quality. A simulation architecture is described based on incremental spoken dialogue processing that can model standardized conversation scenarios on the concept, turn-taking, and speech signal level. Especially the changes in turn-taking during delayed transmission and the retransmission of information due to packet loss are modeled and evaluated based on empirical conversations. The resulting simulated conversations are assessed with methods from the field of spoken dialogue systems and speech quality, resulting in parameters that represent the changes in conversations due to delay and packet loss. The fullband E-model, a standardized parametric model, is extended for conversational interactivity and bursty packet loss to utilize the parameters extracted from the conversations. Finally, the conversational quality is predicted based on the extended E-model and the parameters from the simulated conversations.

Download @TU Berlin