DE eng

Search in the Catalogues and Directories

Page: 1 2 3 4 5...80
Hits 1 – 20 of 1.597

1
A Bottleneck Auto-Encoder for F0 Transformations on Speech and Singing Voice
In: ISSN: 2078-2489 ; Information ; https://hal.archives-ouvertes.fr/hal-03599085 ; Information, MDPI, 2022, 13 (3), pp.102. ⟨10.3390/info13030102⟩ (2022)
Abstract: International audience ; In this publication, we present a deep learning-based method to transform the f0 in speech and singing voice recordings. f0 transformation is performed by training an auto-encoder on the voice signal’s mel-spectrogram and conditioning the auto-encoder on the f0. Inspired by AutoVC/F0, we apply an information bottleneck to it to disentangle the f0 from its latent code. The resulting model successfully applies the desired f0 to the input mel-spectrograms and adapts the speaker identity when necessary, e.g., if the requested f0 falls out of the range of the source speaker/singer. Using the mean f0 error in the transformed mel-spectrograms, we define a disentanglement measure and perform a study over the required bottleneck size. The study reveals that to remove the f0 from the auto-encoder’s latent code, the bottleneck size should be smaller than four for singing and smaller than nine for speech. Through a perceptive test, we compare the audio quality of the proposed auto-encoder to f0 transformations obtained with a classical vocoder. The perceptive test confirms that the audio quality is better for the auto-encoder than for the classical vocoder. Finally, a visual analysis of the latent code for the two-dimensional case is carried out. We observe that the auto-encoder encodes phonemes as repeated discontinuous temporal gestures within the latent code.
Keyword: [INFO.INFO-SD]Computer Science [cs]/Sound [cs.SD]; [SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing
URL: https://doi.org/10.3390/info13030102
https://hal.archives-ouvertes.fr/hal-03599085
BASE
Hide details
2
Neural Vocoding for Singing and Speaking Voices with the Multi-Band Excited WaveNet
In: ISSN: 2078-2489 ; Information ; https://hal.archives-ouvertes.fr/hal-03599076 ; Information, MDPI, 2022, 13 (3), pp.103. ⟨10.3390/info13030103⟩ (2022)
BASE
Show details
3
Multistream neural architectures for cued-speech recognition using a pre-trained visual feature extractor and constrained CTC decoding
In: ICASSP 2022 - IEEE International Conference on Acoustics, Speech and Signal Processing ; https://hal.archives-ouvertes.fr/hal-03578503 ; ICASSP 2022 - IEEE International Conference on Acoustics, Speech and Signal Processing, May 2022, Singapour, Singapore (2022)
BASE
Show details
4
An Overview of Indian Spoken Language Recognition from Machine Learning Perspective
In: ISSN: 2375-4699 ; EISSN: 2375-4702 ; ACM Transactions on Asian and Low-Resource Language Information Processing ; https://hal.inria.fr/hal-03616853 ; ACM Transactions on Asian and Low-Resource Language Information Processing, ACM, In press, ⟨10.1145/3523179⟩ (2022)
BASE
Show details
5
Etude de cas de pathologies de la parole dans le cadre de la prise en charge orthophonique
In: https://hal.archives-ouvertes.fr/hal-03568182 ; 2022 (2022)
BASE
Show details
6
Differentially private speaker anonymization
In: https://hal.inria.fr/hal-03588932 ; 2022 (2022)
BASE
Show details
7
Automatic assessment of oral readings of young pupils
In: ISSN: 0167-6393 ; EISSN: 1872-7182 ; Speech Communication ; https://hal.archives-ouvertes.fr/hal-03585934 ; Speech Communication, Elsevier : North-Holland, 2022, 138, pp.67-79. ⟨10.1016/j.specom.2022.01.008⟩ ; https://www.sciencedirect.com/science/article/pii/S0167639322000164?via%3Dihub (2022)
BASE
Show details
8
Unsupervised quantification of entity consistency between photos and text in real-world news ...
Müller-Budack, Eric. - : Hannover : Institutionelles Repositorium der Leibniz Universität Hannover, 2022
BASE
Show details
9
Danish Fungi 2020
Picek, Lukáš; Šulc, Milan; Matas, Jiří. - : IEEE/CVF, 2022
BASE
Show details
10
Principles of Learning in Multitask Settings: A Probabilistic Perspective ...
Al-shedivat, Maruan. - : Carnegie Mellon University, 2022
BASE
Show details
11
Principles of Learning in Multitask Settings: A Probabilistic Perspective ...
Al-shedivat, Maruan. - : Carnegie Mellon University, 2022
BASE
Show details
12
The 2021 NIST Speaker Recognition Evaluation ...
BASE
Show details
13
Cross-view Brain Decoding ...
BASE
Show details
14
Learning English with Peppa Pig ...
BASE
Show details
15
Who has ears, listen: Citizen Listening Program for disease prevention. ...
García Pereira, Ramiro. - : figshare, 2022
BASE
Show details
16
Who has ears, listen: Citizen Listening Program for disease prevention. ...
García Pereira, Ramiro. - : figshare, 2022
BASE
Show details
17
Segmentation of Glottal Images from High-Speed Videoendoscopy Optimized by Synchronous Acoustic Recordings
In: Sensors; Volume 22; Issue 5; Pages: 1751 (2022)
BASE
Show details
18
Connecting Text Classification with Image Classification: A New Preprocessing Method for Implicit Sentiment Text Classification
In: Sensors; Volume 22; Issue 5; Pages: 1899 (2022)
BASE
Show details
19
A Study of F0 Modification for X-Vector Based Speech Pseudonymization Across Gender
In: PPAI 2021 - The Second AAAI Workshop on Privacy-Preserving Artificial Intelligence ; https://hal.archives-ouvertes.fr/hal-02995862 ; PPAI 2021 - The Second AAAI Workshop on Privacy-Preserving Artificial Intelligence, Feb 2021, Virtual, China (2021)
BASE
Show details
20
Assessment of adult speech disorders: current situation and needs in French-speaking clinical practice
In: ISSN: 1401-5439 ; Logopedics Phoniatrics Vocology ; https://hal.archives-ouvertes.fr/hal-03120115 ; Logopedics Phoniatrics Vocology, Taylor & Francis, 2021, pp.1-15. ⟨10.1080/14015439.2020.1870245⟩ (2021)
BASE
Show details

Page: 1 2 3 4 5...80

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
3
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
1.594
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern