Quality and Usability

Spatial TelephonE conferencing for AsterisK (STEAK)

Spatial TelephonE conferencing for AsterisK (STEAK)

In the STEAK Project, we are implementing a telephone conferencing system that provides (a) spatial audio and (b) is legacy compatible. This system will provide a spatial representation (i. e., “3D audio”) of a telephone conference using binaural synthesis (i. e., a signal for each ear is computed and presented using a pair of headphones).

STEAK has two goals:

1. Implementation of a real spatial conferencing system

In the STEAK project, a legacy compatible spatial telephone conference system is implemented. If participants join a conference via VoIP (WebRTC or SIP) and are capable of processing a stereo signal, then a spatial presentation is made available to them. This system will be legacy compatible, so participants can also join a conference call via standard telephones. These participants then only receive a standard mono-mixed signal without spatial cues.

The system will be implemented using open-source components only and the final implementation is going to be released as open-source.

2. Research on the advantages of spatial conferencing

Spatial representation for telephone conferencing is expected to reduce effort for participants to follow the conversation. In the context of the STEAK project, we are going to investigate if telephony-related such as background noise can be attributed to individual participants of a telephone conference.

STEAK has one limitation: no headtracking

The telephone conferencing system will be implemented as a centralized conferencing bridge (i. e., binaural rendering takes place at the conferencing server and not on each client), and clients will only playback the received signal. Using a centralized bridge avoids sendings the signals of all participants to all other participants, but nearly prevents accounting for head rotation of participants. Humans explore their sound environment by rotating their head and thus can select and focus on individual sound sources. For virtual sound scapes, this requires that the head rotation can be measured precisely and, if the head rotates, adjust the virtual sound scape. This requires the use of a headtracker and the virtual sound scape needs to be adjusted almost instantanously. Due to the implementation of centralized conferencing bridge, such a low delay cannot be fulfilled as the rotation must be measured, send to the server, server calculates the new sound scape, and send the signals to the client. Thus, headtracking is out of scope.

For more details visit the project webpage: www.steakconferencing.de


Team Members: Dennis Guse 

Duration: 01/2016 until 12/2016

Webpage: www.steakconferencing.de