1 |
Emotion Intensity and its Control for Emotional Voice Conversion ...
|
|
|
|
BASE
|
|
Show details
|
|
2 |
Dawn of the transformer era in speech emotion recognition: closing the valence gap ...
|
|
|
|
Abstract:
Recent advances in transformer-based architectures which are pre-trained in self-supervised manner have shown great promise in several machine learning tasks. In the audio domain, such architectures have also been successfully utilised in the field of speech emotion recognition (SER). However, existing works have not evaluated the influence of model size and pre-training data on downstream performance, and have shown limited attention to generalisation, robustness, fairness, and efficiency. The present contribution conducts a thorough analysis of these aspects on several pre-trained variants of wav2vec 2.0 and HuBERT that we fine-tuned on the dimensions arousal, dominance, and valence of MSP-Podcast, while additionally using IEMOCAP and MOSI to test cross-corpus generalisation. To the best of our knowledge, we obtain the top performance for valence prediction without use of explicit linguistic information, with a concordance correlation coefficient (CCC) of .638 on MSP-Podcast. Furthermore, our ...
|
|
Keyword:
Audio and Speech Processing eess.AS; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering; Machine Learning cs.LG; Sound cs.SD
|
|
URL: https://dx.doi.org/10.48550/arxiv.2203.07378 https://arxiv.org/abs/2203.07378
|
|
BASE
|
|
Hide details
|
|
3 |
Probing Speech Emotion Recognition Transformers for Linguistic Knowledge ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
An Improved StarGAN for Emotional Voice Conversion: Enhancing Voice Quality and Data Augmentation ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
The INTERSPEECH 2021 Computational Paralinguistics Challenge: COVID-19 Cough, COVID-19 Speech, Escalation & Primates ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
On the Impact of Word Error Rate on Acoustic-Linguistic Speech Emotion Recognition: An Update for the Deep Learning Era ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Multistage linguistic conditioning of convolutional layers for speech emotion recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
A Comparison of Acoustic and Linguistics Methodologies for Alzheimer's Dementia Recognition
|
|
|
|
In: http://infoscience.epfl.ch/record/284990 (2021)
|
|
BASE
|
|
Show details
|
|
9 |
The voice of COVID-19: Acoustic correlates of infection in sustained vowels
|
|
|
|
In: J Acoust Soc Am (2021)
|
|
BASE
|
|
Show details
|
|
10 |
COVID-19 and Computer Audition: An Overview on What Speech & Sound Analysis Could Contribute in the SARS-CoV-2 Corona Crisis
|
|
|
|
In: Front Digit Health (2021)
|
|
BASE
|
|
Show details
|
|
11 |
AI-based Human Audio Processing for COVID-19: A Comprehensive Overview
|
|
|
|
In: Pattern Recognit (2021)
|
|
BASE
|
|
Show details
|
|
12 |
Face Mask Recognition from Audio: The MASC Database and an Overview on the Mask Challenge
|
|
|
|
In: Pattern Recognit (2021)
|
|
BASE
|
|
Show details
|
|
13 |
Audio, Speech, Language, & Signal Processing for COVID-19: A Comprehensive Overview ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Speaker trait characterization in web videos: Uniting speech, language, and facial features
|
|
|
|
In: Proceedings of the 38th International Conference on Acoustics, Speech and Signal Processing (ICASSP 2013) ; 3647-3651 ; International Conference on Acoustics, Speech and Signal Processing (ICASSP 2013) ; 38 (2020)
|
|
BASE
|
|
Show details
|
|
15 |
On-line emotion recognition in a 3-D activation-valence-time continuum using acoustic and linguistic cues
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Discrimination of speech and non-linguistic vocalizations by non-negative matrix factorization
|
|
|
|
BASE
|
|
Show details
|
|
17 |
A comparison of acoustic and linguistics methodologies for Alzheimer's dementia recognition
|
|
|
|
BASE
|
|
Show details
|
|
18 |
"The Godfather" vs. "Chaos": comparing linguistic analysis based on on-line knowledge sources and Bags-of-N-Grams for movie review valence estimation
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles
|
|
|
|
BASE
|
|
Show details
|
|
20 |
On the influence of phonetic content variation for acoustic emotion recognition
|
|
|
|
BASE
|
|
Show details
|
|
|
|