1 |
An empirical analysis of information encoded in disentangled neural speaker representations ...
|
|
|
|
Abstract:
The primary characteristic of robust speaker representations is that they are invariant to factors of variability not related to speaker identity. Disentanglement of speaker representations is one of the techniques used to improve robustness of speaker representations to both intrinsic factors that are acquired during speech production (e.g., emotion, lexical content) and extrinsic factors that are acquired during signal capture (e.g., channel, noise). Disentanglement in neural speaker representations can be achieved either in a supervised fashion with annotations of the nuisance factors (factors not related to speaker identity) or in an unsupervised fashion without labels of the factors to be removed. In either case it is important to understand the extent to which the various factors of variability are entangled in the representations. In this work, we examine speaker representations with and without unsupervised disentanglement for the amount of information they capture related to a suite of factors. ... : Submitted to Speaker Odyssey 2020 ...
|
|
Keyword:
Audio and Speech Processing eess.AS; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering; Sound cs.SD
|
|
URL: https://dx.doi.org/10.48550/arxiv.2002.03520 https://arxiv.org/abs/2002.03520
|
|
BASE
|
|
Hide details
|
|
2 |
Predicting Behavior in Cancer-Afflicted Patient and Spouse Interactions using Speech and Language ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Learning from Past Mistakes: Improving Automatic Speech Recognition Output via Noisy-Clean Phrase Context Modeling ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|