Quality and Usability

Evaluating the quality of speech services using crowdsourcing

The Quality of Experience (QoE) of Internet services has recently gained increased interest by academia and industry. For telephone speech services, long-established recommendations exist for speech quality evaluation. Typically, QoE is evaluated by means of subjective experiments in a laboratory environment, which allows to control confounding factors, e.g. background noise conditions. To collect reliable results, researchers and standardization bodies like the International Telecommunication Union (ITU) consider the controlled setting to be essential for QoE measurements. However, the laboratory experiments invoke high efforts in terms of costs and time.

Crowdsourcing (CS) offers new possibilities for QoE research and provides a global pool of participants for shifting QoE experiments into the Internet. The potential benefits of crowdsourced QoE studies are the investigation

a) of participant influence factors due to the diverse population of users,

b) of environmental influence factors due to the real-life environment of the participants, and

c) reduced costs and turnaround times.

The ITU-T Rec. P.808 (on using CS approach for speech quality assessment) emphasizes the influence of the participants’ characteristics, the test environment and the playback system on the validity and reliability of results and details the desired characteristics. However, it does not provide guidance on how to remotely test those characteristics, nor it provides a quantitative assessment of the impact of those characteristics.

In this project, we will systematically answer the following key research questions based on numerous experiments conducted in laboratory and CS: “How should crowdsourcing-based speech quality evaluation experiments be set up to provide valid and reliable results? Specifically, how can the characteristics of the test participants, the test environment and the playback system be assessed in online tests? Which differences are to be expected between crowdsourcing-based and laboratory speech quality evaluation? In which way affect these differences the development of instrumental speech quality prediction models?”

It is the aim of the present research project to analyze the impact of the most important characteristics of the listener, the listening device, and the test environment in CS-based speech quality assessment, and to quantify their impact in comparison to the standard laboratory experiments. Furthermore, valid and reliable test methods will be specified to remotely analyze relevant characteristics (e.g. environmental noise). The analysis will lead to a proposal for updating ITU-T Rec. P.808. In addition, we will evaluate the performance of existing instrumental models for predicting speech quality on the datasets (speech materials and subjective scores) collected in the frame of this project using the CS-based approach. The datasets and resulting recommendations will be openly available to research community.

Time Frame: 04/2021 - 12/2024

Funding by: Deutsche Forschungsgemeinschaft (DFG) MO 1038/32-1