Speech Communication

Proposed Topics

Our instructors have put together a list of possible topics for final theses at the Chair of Speech Communication. You are welcome to use them for your bachelor’s or master’s thesis.

Speech in advertising through the ages

Description

Data: Examples of advertising from YouTube with the dependent variable time (decade) and the independent variables speaker gender, marketed item, company, target group, etc.

Method:

Extraction of comparable phrases and acoustic/phonetic analyses

 

Robust features for weak labeling

Research question:

Weak labeling refers to annotating data using machine classifiers for the purpose of self-optimization.

You will examine several acoustic parameters with regards to their suitability for weak labeling, specifically this means

  • An annotated (e.g. emotional) database is used as a reference to train a “seed model” and as a test set.
  • Another non-annotated database is used for the weak labeling to supplement the training.

The results will be evaluated for the following two outcomes:

  • The classifier should improve in the test set.
  • A perception experiment should prove the perceptive relevance of the weak labeled samples.

Machine age recognition

Research question:

Can machine age recognition be improved in a rule-based approach?

Data:

(Re)synthesized language data for which certain acoustic aspects have been systematically changed based on the hypothesis

Classification experiment:

The synthesized stimuli are added to a machine age classifier (perceptual age) for training and a test set is used to see if recognition accuracy increases.

Analysis through synthesis: age in voice

Research question:

What makes a voice sound older or younger?

Data:

(Re)synthesized language data for which certain acoustic aspects have been systematically changed based on the hypothesis

Perception experiment:

The synthesized stimuli are played to several listeners who will then estimate the age of the “speaker.”

Modeling emotional speech expression with SSML

Speech data:

Synthesized data with neutral or intentionally emotional content, systematically varied according to prosodic parameters

Traits:

Suitable traits are selected from literature and SSML (Speech Synthesis Markup Language) control parameters are derived and parameterized.

Perception verification:

Survey of a representative group of listeners about relevance

Analysis:

  • Correlation of listener judgments and systematic variation

Valence in voices

Speech data:

Publicly accessible annotated data collection including valence notation, e.g. MSPPodcast

Traits:

Suitable traits are selected from literature and a complex trait is derived and parameterized.

Analysis:

  • Analysis of regression/classification of valence compared to standard traits and a machine learning process
  • Manual analysis of selected language samples

Smiling while speaking

Speech data:

A representative collection of speech samples with smiling vs. without smiling using videos

Perception verification:

Survey of a representative group of listeners about smile recognition

Traits:

Suitable traits are selected from literature and measured in representative segments.

Analysis:

  • Recipient correlation check
  • Analysis of correlations between ground truth (smiling in the videos), acoustic traits, and listener judgments

Age recognition over time

Speech data:

You will collect samples of speeches by prominent figures from YouTube and conduct research about age, e.g. 10 people per gender and language, 3 similar segments for each subject.

Traits:

Suitable traits are selected from literature and measured in representative segments

Analysis:

Analysis of correlations between trait changes across ages independent of the speaker

Free-form vs. reading – a comparison of two speaking styles using your own recordings

Suitable for a bachelor’s thesis

Speech data:

You record yourself reading a text (e.g. “The North Wind and the Sun” by Aesop) and speaking freely (e.g. by describing a picture).

Analysis:

Linguistic comparison of speech traits when reading aloud vs. speaking freely using your own recordings including derivation of general differences and similarities

Variation of pronunciation based on speaking speed

Suitable for a bachelor’s thesis

Speech data:

You record yourself speaking slower and faster.

Analysis:

Comparison of traits of faster and slower speaking, e.g. pronunciation precision, syllable duration, etc.

Acoustic comparison of dubbing voices

Suitable for a bachelor’s thesis

Speech data:

Recordings from films or series

Possible content:

You can compare original voices with the German dubbed voices to determine whether the dubbed voice is a suitable fit. It is also possible to explore whether the dubbed voice matches or contradicts the figure’s character/personality (can be tested with a perception test). You could also explore how the same voice actor is used for several actors or vice versa - changes to dubbed voices for actors

Methods for forensic voice comparison

Description will be provided shortly.

Can lies be detected on the basis of voice and the way of speaking?

Description will be provided shortly.

Pathological voice tremor

Description will be provided shortly.