Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Page: 1 2 3 4 5...80

Hits 1 – 20 of 1.597

1	A Bottleneck Auto-Encoder for F0 Transformations on Speech and Singing Voice
	Bous, Frederik; Roebel, Axel
	In: ISSN: 2078-2489 ; Information ; https://hal.archives-ouvertes.fr/hal-03599085 ; Information, MDPI, 2022, 13 (3), pp.102. ⟨10.3390/info13030102⟩ (2022)
	Abstract: International audience ; In this publication, we present a deep learning-based method to transform the f0 in speech and singing voice recordings. f0 transformation is performed by training an auto-encoder on the voice signal’s mel-spectrogram and conditioning the auto-encoder on the f0. Inspired by AutoVC/F0, we apply an information bottleneck to it to disentangle the f0 from its latent code. The resulting model successfully applies the desired f0 to the input mel-spectrograms and adapts the speaker identity when necessary, e.g., if the requested f0 falls out of the range of the source speaker/singer. Using the mean f0 error in the transformed mel-spectrograms, we define a disentanglement measure and perform a study over the required bottleneck size. The study reveals that to remove the f0 from the auto-encoder’s latent code, the bottleneck size should be smaller than four for singing and smaller than nine for speech. Through a perceptive test, we compare the audio quality of the proposed auto-encoder to f0 transformations obtained with a classical vocoder. The perceptive test confirms that the audio quality is better for the auto-encoder than for the classical vocoder. Finally, a visual analysis of the latent code for the two-dimensional case is carried out. We observe that the auto-encoder encodes phonemes as repeated discontinuous temporal gestures within the latent code.
	Keyword: [INFO.INFO-SD]Computer Science [cs]/Sound [cs.SD]; [SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing
	URL: https://doi.org/10.3390/info13030102 https://hal.archives-ouvertes.fr/hal-03599085
	BASE
	Hide details

2	Neural Vocoding for Singing and Speaking Voices with the Multi-Band Excited WaveNet
	Roebel, Axel; Bous, Frederik
	In: ISSN: 2078-2489 ; Information ; https://hal.archives-ouvertes.fr/hal-03599076 ; Information, MDPI, 2022, 13 (3), pp.103. ⟨10.3390/info13030103⟩ (2022)
	BASE
	Show details

3	Multistream neural architectures for cued-speech recognition using a pre-trained visual feature extractor and constrained CTC decoding
	Sankar, Sanjana; Beautemps, Denis; Hueber, Thomas
	In: ICASSP 2022 - IEEE International Conference on Acoustics, Speech and Signal Processing ; https://hal.archives-ouvertes.fr/hal-03578503 ; ICASSP 2022 - IEEE International Conference on Acoustics, Speech and Signal Processing, May 2022, Singapour, Singapore (2022)
	BASE
	Show details

4	An Overview of Indian Spoken Language Recognition from Machine Learning Perspective
	Dey, Spandan; Sahidullah, Md; Saha, Goutam
	In: ISSN: 2375-4699 ; EISSN: 2375-4702 ; ACM Transactions on Asian and Low-Resource Language Information Processing ; https://hal.inria.fr/hal-03616853 ; ACM Transactions on Asian and Low-Resource Language Information Processing, ACM, In press, ⟨10.1145/3523179⟩ (2022)
	BASE
	Show details

5	Etude de cas de pathologies de la parole dans le cadre de la prise en charge orthophonique
	Sicard, Etienne; Menin-Sicard, Anne; Michel, Sandrine...
	In: https://hal.archives-ouvertes.fr/hal-03568182 ; 2022 (2022)
	BASE
	Show details

6	Differentially private speaker anonymization
	Shamsabadi, Ali Shahin; Srivastava, Brij Mohan Lal; Bellet, Aurélien...
	In: https://hal.inria.fr/hal-03588932 ; 2022 (2022)
	BASE
	Show details

7	Automatic assessment of oral readings of young pupils
	Bailly, Gérard; Godde, Erika; Piat-Marchand, Anne-Laure...
	In: ISSN: 0167-6393 ; EISSN: 1872-7182 ; Speech Communication ; https://hal.archives-ouvertes.fr/hal-03585934 ; Speech Communication, Elsevier : North-Holland, 2022, 138, pp.67-79. ⟨10.1016/j.specom.2022.01.008⟩ ; https://www.sciencedirect.com/science/article/pii/S0167639322000164?via%3Dihub (2022)
	BASE
	Show details

8	Unsupervised quantification of entity consistency between photos and text in real-world news ...
	Müller-Budack, Eric. - : Hannover : Institutionelles Repositorium der Leibniz Universität Hannover, 2022
	BASE
	Show details

9	Danish Fungi 2020
	Picek, Lukáš; Šulc, Milan; Matas, Jiří. - : IEEE/CVF, 2022
	BASE
	Show details

10	Principles of Learning in Multitask Settings: A Probabilistic Perspective ...
	Al-shedivat, Maruan. - : Carnegie Mellon University, 2022
	BASE
	Show details

11	Principles of Learning in Multitask Settings: A Probabilistic Perspective ...
	Al-shedivat, Maruan. - : Carnegie Mellon University, 2022
	BASE
	Show details

12	The 2021 NIST Speaker Recognition Evaluation ...
	Sadjadi, Seyed Omid; Greenberg, Craig; Singer, Elliot. - : arXiv, 2022
	BASE
	Show details

13	Cross-view Brain Decoding ...
	Oota, Subba Reddy; Arora, Jashn; Gupta, Manish. - : arXiv, 2022
	BASE
	Show details

14	Learning English with Peppa Pig ...
	Nikolaus, Mitja; Alishahi, Afra; Chrupała, Grzegorz. - : arXiv, 2022
	BASE
	Show details

15	Who has ears, listen: Citizen Listening Program for disease prevention. ...
	García Pereira, Ramiro. - : figshare, 2022
	BASE
	Show details

16	Who has ears, listen: Citizen Listening Program for disease prevention. ...
	García Pereira, Ramiro. - : figshare, 2022
	BASE
	Show details

17	Segmentation of Glottal Images from High-Speed Videoendoscopy Optimized by Synchronous Acoustic Recordings
	Bartosz Kopczynski; Ewa Niebudek-Bogusz; Wioletta Pietruszewska; Pawel Strumillo
	In: Sensors; Volume 22; Issue 5; Pages: 1751 (2022)
	BASE
	Show details

18	Connecting Text Classification with Image Classification: A New Preprocessing Method for Implicit Sentiment Text Classification
	Meikang Chen; Kurban Ubul; Xuebin Xu; Alimjan Aysa; Mahpirat Muhammat
	In: Sensors; Volume 22; Issue 5; Pages: 1899 (2022)
	BASE
	Show details

19	A Study of F0 Modification for X-Vector Based Speech Pseudonymization Across Gender
	Champion, Pierre; Jouvet, Denis; Larcher, Anthony
	In: PPAI 2021 - The Second AAAI Workshop on Privacy-Preserving Artificial Intelligence ; https://hal.archives-ouvertes.fr/hal-02995862 ; PPAI 2021 - The Second AAAI Workshop on Privacy-Preserving Artificial Intelligence, Feb 2021, Virtual, China (2021)
	BASE
	Show details

20	Assessment of adult speech disorders: current situation and needs in French-speaking clinical practice
	Pommée, Timothy; Balaguer, Mathieu; Mauclair, Julie...
	In: ISSN: 1401-5439 ; Logopedics Phoniatrics Vocology ; https://hal.archives-ouvertes.fr/hal-03120115 ; Logopedics Phoniatrics Vocology, Taylor & Francis, 2021, pp.1-15. ⟨10.1080/14015439.2020.1870245⟩ (2021)
	BASE
	Show details

Page: 1 2 3 4 5...80

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern