DE eng

Search in the Catalogues and Directories

Page: 1 2 3 4 5...46
Hits 1 – 20 of 906

1
A Bottleneck Auto-Encoder for F0 Transformations on Speech and Singing Voice
In: ISSN: 2078-2489 ; Information ; https://hal.archives-ouvertes.fr/hal-03599085 ; Information, MDPI, 2022, 13 (3), pp.102. ⟨10.3390/info13030102⟩ (2022)
Abstract: International audience ; In this publication, we present a deep learning-based method to transform the f0 in speech and singing voice recordings. f0 transformation is performed by training an auto-encoder on the voice signal’s mel-spectrogram and conditioning the auto-encoder on the f0. Inspired by AutoVC/F0, we apply an information bottleneck to it to disentangle the f0 from its latent code. The resulting model successfully applies the desired f0 to the input mel-spectrograms and adapts the speaker identity when necessary, e.g., if the requested f0 falls out of the range of the source speaker/singer. Using the mean f0 error in the transformed mel-spectrograms, we define a disentanglement measure and perform a study over the required bottleneck size. The study reveals that to remove the f0 from the auto-encoder’s latent code, the bottleneck size should be smaller than four for singing and smaller than nine for speech. Through a perceptive test, we compare the audio quality of the proposed auto-encoder to f0 transformations obtained with a classical vocoder. The perceptive test confirms that the audio quality is better for the auto-encoder than for the classical vocoder. Finally, a visual analysis of the latent code for the two-dimensional case is carried out. We observe that the auto-encoder encodes phonemes as repeated discontinuous temporal gestures within the latent code.
Keyword: [INFO.INFO-SD]Computer Science [cs]/Sound [cs.SD]; [SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing
URL: https://doi.org/10.3390/info13030102
https://hal.archives-ouvertes.fr/hal-03599085
BASE
Hide details
2
Neural Vocoding for Singing and Speaking Voices with the Multi-Band Excited WaveNet
In: ISSN: 2078-2489 ; Information ; https://hal.archives-ouvertes.fr/hal-03599076 ; Information, MDPI, 2022, 13 (3), pp.103. ⟨10.3390/info13030103⟩ (2022)
BASE
Show details
3
Learning and controlling the source-filter representation of speech with a variational autoencoder
In: https://hal.archives-ouvertes.fr/hal-03650569 ; 2022 (2022)
BASE
Show details
4
A comparative study of several parameterizations for speaker recognition ...
Faundez-Zanuy, Marcos. - : arXiv, 2022
BASE
Show details
5
Speaker verification in mismatch training and testing conditions ...
BASE
Show details
6
Speech Segmentation Optimization using Segmented Bilingual Speech Corpus for End-to-end Speech Translation ...
BASE
Show details
7
A New Amharic Speech Emotion Dataset and Classification Benchmark ...
BASE
Show details
8
The Norwegian Parliamentary Speech Corpus ...
Solberg, Per Erik; Ortiz, Pablo. - : arXiv, 2022
BASE
Show details
9
Subspace-based Representation and Learning for Phonotactic Spoken Language Recognition ...
BASE
Show details
10
LPC Augment: An LPC-Based ASR Data Augmentation Algorithm for Low and Zero-Resource Children's Dialects ...
BASE
Show details
11
Automatic Dialect Density Estimation for African American English ...
BASE
Show details
12
End-to-end contextual asr based on posterior distribution adaptation for hybrid ctc/attention system ...
Zhang, Zhengyi; Zhou, Pan. - : arXiv, 2022
BASE
Show details
13
Towards Contextual Spelling Correction for Customization of End-to-end Speech Recognition Systems ...
BASE
Show details
14
SHAS: Approaching optimal Segmentation for End-to-End Speech Translation ...
BASE
Show details
15
Automatic Detection of Speech Sound Disorder in Child Speech Using Posterior-based Speaker Representations ...
BASE
Show details
16
Towards a Perceptual Model for Estimating the Quality of Visual Speech ...
BASE
Show details
17
Learning and controlling the source-filter representation of speech with a variational autoencoder ...
BASE
Show details
18
Repeat after me: Self-supervised learning of acoustic-to-articulatory mapping by vocal imitation ...
BASE
Show details
19
Can Social Robots Effectively Elicit Curiosity in STEM Topics from K-1 Students During Oral Assessments? ...
BASE
Show details
20
Expression-preserving face frontalization improves visually assisted speech processing ...
BASE
Show details

Page: 1 2 3 4 5...46

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
906
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern