A variety of problems in multimedia processing are based on segmentation, classification and clustering of audio. These includeapplications such as robust automatic speech recognition (ASR), video stream segmentation, context recognition, browsing and audio/video retrieval. To accomplish this in a human readable way, especially for end-user applications, audio data is usually organized and indexed using words as tags or labels. Consequently, the objective of audio retrieval systems, in general, is to distinguish amongst these categories of audio.
Contemporary work in audio classification differ in its categorization and classification scheme according to the proposed application. Typically,content-based approach for audio information retrieval use and attempt to recognize an arbitrary set of high-level audio categories such as animals, bells, crowds, female, music etc. in a given audio clip. It can be seen that this problem quickly becomes difficult as the application scope is further generalized where hundreds of acoustic sources can be present in any given clip.
In contrast, the framework presented here does not deal with training models for explicit class definitions (such as animal sounds, music, ,speech etc.). Instead, the framework derives class-independent representation of an audio clip. This framework mostly deals with unstructured audio that covers a wide variety of domains or scenes that can contain any number of (unknown) audio sources Typically, a whole audio clip is represented as a single vector in a latent perceptual space. This makes the computationally intensive signal-based similarity measure manageable. The method also brings out an underlying latent perceptual structure of audio clips and measures similarity. Additionally, such an approach can be used for a variety of applications without the need of training and/or reconfiguring the retrieval system implementation. For example, music information processing without explicitly extracting musical information is an imporant avenue that is being explored.
Ongoing Research and Relevant Applications:
Time Frame: 12/08 - 12/10
T-Labs Team Members:Shiva Sundaram
Funding by: Deutsche Telekom Laboratories