Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Hits 1 – 5 of 5

1	Towards a Perceptual Model for Estimating the Quality of Visual Speech ...
	Aldeneh, Zakaria; Fedzechkina, Masha; Seto, Skyler. - : arXiv, 2022
	BASE
	Show details

2	Learning Paralinguistic Features from Audiobooks through Style Voice Conversion ...
	NAACL 2021 2021; Aldeneh, Zakaria; Perez, Matthew. - : Underline Science Inc., 2021
	BASE
	Show details

3	Robust Methods for the Automatic Quantification and Prediction of Affect in Spoken Interactions
	Aldeneh, Zakaria. - 2020
	Abstract: Emotional expression plays a key role in interactions as it communicates the necessary context needed for understanding the behaviors and intentions of individuals. Therefore, a speech-based Artificial Intelligence (AI) system that can recognize and interpret emotional expression has many potential applications with measurable impact to a variety of areas, including human-computer interaction (HCI) and healthcare. However, there are several factors that make speech emotion recognition (SER) a difficult task; these factors include: variability in speech data, variability in emotion annotations, and data sparsity. This dissertation explores methodologies for improving the robustness of the automatic recognition of emotional expression from speech by addressing the impacts of these factors on various aspects of the SER system pipeline. For addressing speech data variability in SER, we propose modeling techniques that improve SER performance by leveraging short-term dynamical properties of speech. Furthermore, we demonstrate how data augmentation improves SER robustness to speaker variations. Lastly, we discover that we can make more accurate predictions of emotion by considering the fine-grained interactions between the acoustic and lexical components of speech. For addressing the variability in emotion annotations, we propose SER modeling techniques that account for the behaviors of annotators (i.e., annotators' reaction delay) to improve time-continuous SER robustness. For addressing data sparsity, we investigate two methods that enable us to learn robust embeddings, which highlight the differences that exist between neutral speech and emotionally expressive speech, without requiring emotion annotations. In the first method, we demonstrate how emotionally charged vocal expressions change speaker characteristics as captured by embeddings extracted from a speaker identification model, and we propose the use of these embeddings in SER applications. In the second method, we propose a framework for learning emotion embeddings using audio-textual data that is not annotated for emotion. The unification of the methods and results presented in this thesis helps enable the development of more robust SER systems, making key advancements toward an interactive speech-based AI system that is capable of recognizing and interpreting human behaviors. ; PHD ; Computer Science & Engineering ; University of Michigan, Horace H. Rackham School of Graduate Studies ; http://deepblue.lib.umich.edu/bitstream/2027.42/166106/1/aldeneh_1.pdf
	Keyword: affective computing; applied machine learning; Computer Science; Engineering; speech emotion recognition; speech processing
	URL: https://hdl.handle.net/2027.42/166106 https://doi.org/10.7302/29
	BASE
	Hide details

4	Controlling for Confounders in Multimodal Emotion Classification via Adversarial Learning ...
	Jaiswal, Mimansa; Aldeneh, Zakaria; Provost, Emily Mower. - : arXiv, 2019
	BASE
	Show details

5	Identifying Mood Episodes Using Dialogue Features from Clinical Interviews ...
	Aldeneh, Zakaria; Jaiswal, Mimansa; Picheny, Michael. - : arXiv, 2019
	BASE
	Show details

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern