Quality and Usability

Hierarchical Multimodal Interfaces

Motivation & Project Description

In hierarchical processing, a general problem is decomposed into some sub-problems based on the hierarchical structure which is inherent in the problem. In this project, we investigate design of practical, non-complex, and adaptive/reconfigurable multi-modal interface systems based on the concept of hierarchical processing. Hierarchical processing provides the possibility for efficient use of small training data and low computational resources, integration of prior knowledge, and proper combination of different modalities. It also provides the advantage of decoupling training and inference at different levels, so that the existing system can be easily reconfigured for new applications and situations. The idea of hierarchical interfaces will be initially investigated for the following applications:

  • User activity and context detection using mobile phones
  • General purpose audio switches based on speech and non-speech event detection
  • Call classification based on emotion, age, gender and language for automated voice portals

User activity and context detection using mobile phones

As one of the main tracks in this project, we investigate the concept of hierarchical interface design in detecting user activity and environment context using mobile phones. Many mobile phones are equipped with sophisticated sensors such as accelerometer sensors. A mobile phone carried by a user can collect some information about the user pattern of movement in terms of acceleration. The acceleration pattern can be different, during different activities such as walking, sitting, running, travelling in a car, etc. In addition, all mobile phones can collect audio information which can give some information about the ongoing activities such as being in a meeting, in a party, in street, etc. A sophisticated algorithm can use acceleration, audio (provided by microphone) or a combination of acceleration and audio information for detecting current activity of the user. Detecting user activity in such a manner can have several applications. It can be used to automatically turn on/off mobile phone ring or change the ring volume, and prioritize or adapt other phone functionalities, depending on the current situation of the user. It can be also used for providing a summary of daily activities, and studying the relation between user activities and his efficiency. This can be especially important for management of an organization to study collective pattern of activities for employees and find solutions to maximize their efficiency. In addition, the activity detection can be used in tacking care of elderly and children. It can provide a medical doctor with a summary of daily physical activities, as well as warnings in case of unexpected movements (falling, accident, etc.).

Expected outcome:


  • Algorithms & modeling techniques for hierarchical design of multi-modal interfaces
  • Extending discriminative model training and evidence estimation
  • Some practical applications (demonstrations):


    • Context and activity detection using mobile phones
    • General purpose audio switches based on speech and non-speech event detection
    • Contribution to call calssification based on emotion, age, gender and language


  • Research results and publications, patents, etc.


Time Frame: 02/2009 – 02/2011

T-labs Team Members: Hamed Ketabdar, Shiva Sundaram, Sebastian Möller, Tim Polzehl

Funding by: Deutsche Telekom Laboratories