Quality and Usability

Multimedia Content Retrieval

A variety of problems in multimedia processing are based on segmentation, classification and clustering of audio. These includeapplications such as robust automatic speech recognition (ASR), video stream segmentation, context recognition, browsing and audio/video retrieval. To accomplish this in a human readable way, especially for end-user applications, audio data is usually organized and indexed using words as tags or labels. Consequently, the objective of audio retrieval systems, in general, is to distinguish amongst these categories of audio.

Contemporary work in audio classification differ in its categorization and classification scheme according to the proposed application. Typically,content-based approach for audio information retrieval  use  and attempt to recognize an arbitrary set of high-level audio categories such as animals, bells, crowds, female, music etc. in a given audio clip. It can be seen that this problem quickly becomes difficult as the application scope is  further generalized where hundreds of acoustic sources can be present in any given clip.

In contrast, the framework presented here does not deal with training models for explicit class definitions (such as animal sounds, music, ,speech etc.). Instead, the framework derives class-independent representation of an audio clip. This framework mostly deals with unstructured audio  that covers a wide variety of domains or scenes that can contain any number of (unknown) audio sources  Typically, a whole audio clip is represented as a single vector in a latent perceptual space. This makes the computationally intensive signal-based similarity measure manageable. The method also brings out an underlying latent perceptual structure of audio clips and measures similarity. Additionally, such an approach can be used for a variety of applications without the need of training and/or reconfiguring the retrieval system implementation.  For example, music information processing without explicitly  extracting musical information is an imporant avenue that is being explored.

Ongoing Research and Relevant Applications:

  • Multimedia Retrieval Systems for Web 2.0.
    • Audio Classification, Segmentation and Clustering.
    • Audio Event Dection.
    • Acoustic scene classification in handheld devices.
    • Multimodal activity detection (with Hamed Ketabdar)
  • Music Information Processing.
    • Playlist Generation Problem.
    • Highlighting/Thumbnailing.
    • Similarity-based Clustering and/or classification.
    • Music Genre Recognition.
  • Audio Perception and Understanding.
    • Categorization of audio for automatic machine-based processing.
    • Content processing using  semantic/perceptual descriptions.

Related Publications:

  • Sundaram and Narayanan. A divide-and-conquer approach to Latent Perceptual Indexing for Web 2.0 Applications. In proceedings of ICME, Cancun, Mexico, June 2009.
  • Sundaram and Narayanan. Classification of sound clips by two schemes: using onomatopoeia and semantic labels.  In Proceedings of ICME, Hanover, Germany, June 2008.
  • Sundaram and Narayanan. Audio retrieval by latent perceptual indexing. In Proceedings of ICASSP, Las Vegas, Nevada, April 2008.
  • Sundaram and Narayanan. Experiments in automatic genre classification of full-length music tracks using audio activity rate. In Proceedings of IEEE International Workshop on Multimedia Signal Processing, Chania, Greece, October 2007.
  • Sundaram and Narayanan. Analysis of audio clustering using word descriptions.  In Proceedings of ICASSP, Honolulu, Hawaii, April 2007.
  • Sundaram and Narayanan. Discriminating two types of noise sources using cortical representation and dimension reduction technique.  In Proceedings of ICASSP, Honolulu, Hawaii, April 2007.
  • Sundaram and Narayanan. An attribute-based approach to audio description applied to segmenting vocal sections in popular music songs.   In Proceedings of MMSP, Victoria, Canada, October 2006.
  • Sundaram and Narayanan. Vector-based representation and clustering of audio using onomatopoeia words.  In Proceedings of AAAI 2006 Fall Symposium, Arlington, VA, October 2006.
  • Sundaram and Kyriakakis. Phantom Audio Sources with Vertically Separated Speakers.  Audio Engineering Society (AES) Convention Paper 2005.

Time Frame: 12/08 - 12/10

T-Labs Team Members:Shiva Sundaram

Funding by: Deutsche Telekom Laboratories