Quality and Usability


Automatic Evaluation of Interactive Speech-Based Services on Basis of Learned User Models

The more spoken dialogue systems are deployed for different telecommunication services, the higher is the demand for a fast and economic development of these systems. Up to now, two different aspects of evaluation have been considered. Firstly, the performance of the integrated system components, such as speech recognizer, speech understanding unit, dialog manager and speech synthesizer, is quantified. On the other hand, the overall quality is measured in subjective interaction tests. We assume that such tests can be amended by a (semi-) automatic, simulation-based evaluation which would be highly profitable in early stages of real spoken dialogue system development.

So far, an automatic end-to-end evaluation is not possible without human evaluators. In addition, the quantification of different quality features, e.g. efficiency, comfort, usability and acceptability, is a non-trivial problem itself. In general, since quality is the result of a perception and judgment process, the measurement of the mentioned quality aspects requires controlled experiments with human test participants.

In the Project SpeechEval new grounds are explored to quantify quality and usability of spoken dialogue systems, with minimum use of human test participants, so that it is possible to make tests while the system is being designed. The goal of the project is the creation of a workbench that allows semi-automatic evaluation of dialogue systems with respect to the mentioned aspects.

The approach involves training a user simulation model on a specified domain first, and then using this model for testing new, unknown systems and/or domains. For this, statistical methods that allow effective user simulation will be analysed, whereas the quality of the simulation and the applicability in the given context are criteria for their performance.

Besides that, a number of machine learning techniques for optimal policy learning and answer generation will be studied, in order to allow a more realistic human-machine interaction. The validation of the results of the simulation will be carried out on real spoken dialogue systems of the project's industry partners.

Main challenges:

  • Meaningful corpora
  • Robust learning approach
  • Effective user simulation
  • Evaluation aspects and metrics
  • Overall system performance

Expected outcome:

  • New, innovative approach to SDS evaluation
  • Deep insight on learning user strategies generalisation
  • Workbench for (semi-)automated SDS evaluation

Cooperation & related projects:

Time Frame: 08/2008 - 01/2011

T-labs Team Members: B. Belmudez, D. Butenkov, K.-P. Engelbrecht, F. Gödde, R. Haak, C. Kühnel, S. Möller, R. Schleicher

Partners: Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI)

Funding by: Investitionsbank Berlin (IBB)