1 |
A comparative study of several parameterizations for speaker recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
2 |
Speaker verification in mismatch training and testing conditions ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Speech Segmentation Optimization using Segmented Bilingual Speech Corpus for End-to-end Speech Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
A New Amharic Speech Emotion Dataset and Classification Benchmark ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Lahjoita puhetta -- a large-scale corpus of spoken Finnish with some benchmarks ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Subspace-based Representation and Learning for Phonotactic Spoken Language Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
LPC Augment: An LPC-Based ASR Data Augmentation Algorithm for Low and Zero-Resource Children's Dialects ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Automatic Dialect Density Estimation for African American English ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
End-to-end contextual asr based on posterior distribution adaptation for hybrid ctc/attention system ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Towards Contextual Spelling Correction for Customization of End-to-end Speech Recognition Systems ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
SHAS: Approaching optimal Segmentation for End-to-End Speech Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Automatic Detection of Speech Sound Disorder in Child Speech Using Posterior-based Speaker Representations ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Deep Neural Convolutive Matrix Factorization for Articulatory Representation Decomposition ...
|
|
|
|
BASE
|
|
Show details
|
|
15 |
Towards a Perceptual Model for Estimating the Quality of Visual Speech ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Learning and controlling the source-filter representation of speech with a variational autoencoder ...
|
|
|
|
Abstract:
Understanding and controlling latent representations in deep generative models is a challenging yet important problem for analyzing, transforming and generating various types of data. In speech processing, inspiring from the anatomical mechanisms of phonation, the source-filter model considers that speech signals are produced from a few independent and physically meaningful continuous latent factors, among which the fundamental frequency $f_0$ and the formants are of primary importance. In this work, we show that the source-filter model of speech production naturally arises in the latent space of a variational autoencoder (VAE) trained in an unsupervised manner on a dataset of natural speech signals. Using only a few seconds of labeled speech signals generated with an artificial speech synthesizer, we experimentally illustrate that $f_0$ and the formant frequencies are encoded in orthogonal subspaces of the VAE latent space and we develop a weakly-supervised method to accurately and independently control ... : 17 pages, 4 figures, companion website: https://samsad35.github.io/site-sfvae/ ...
|
|
Keyword:
Audio and Speech Processing eess.AS; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering; Machine Learning cs.LG; Sound cs.SD
|
|
URL: https://arxiv.org/abs/2204.07075 https://dx.doi.org/10.48550/arxiv.2204.07075
|
|
BASE
|
|
Hide details
|
|
17 |
Correcting Misproducted Speech using Spectrogram Inpainting ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Repeat after me: Self-supervised learning of acoustic-to-articulatory mapping by vocal imitation ...
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Can Social Robots Effectively Elicit Curiosity in STEM Topics from K-1 Students During Oral Assessments? ...
|
|
|
|
BASE
|
|
Show details
|
|
20 |
An error correction scheme for improved air-tissue boundary in real-time MRI video for speech production ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|