1 |
A Bottleneck Auto-Encoder for F0 Transformations on Speech and Singing Voice
|
|
|
|
In: ISSN: 2078-2489 ; Information ; https://hal.archives-ouvertes.fr/hal-03599085 ; Information, MDPI, 2022, 13 (3), pp.102. ⟨10.3390/info13030102⟩ (2022)
|
|
BASE
|
|
Show details
|
|
2 |
Neural Vocoding for Singing and Speaking Voices with the Multi-Band Excited WaveNet
|
|
|
|
In: ISSN: 2078-2489 ; Information ; https://hal.archives-ouvertes.fr/hal-03599076 ; Information, MDPI, 2022, 13 (3), pp.103. ⟨10.3390/info13030103⟩ (2022)
|
|
BASE
|
|
Show details
|
|
3 |
Learning and controlling the source-filter representation of speech with a variational autoencoder
|
|
|
|
In: https://hal.archives-ouvertes.fr/hal-03650569 ; 2022 (2022)
|
|
BASE
|
|
Show details
|
|
4 |
Analyzing the impact of speaker localization errors on speech separation for automatic speech recognition
|
|
|
|
In: EUSIPCO 2020 - 28th European Signal Processing Conference ; https://hal.inria.fr/hal-02355669 ; EUSIPCO 2020 - 28th European Signal Processing Conference, Jan 2021, Amsterdam / Virtual, Netherlands. ⟨10.23919/Eusipco47968.2020.9287541⟩ ; https://eusipco2020.org/ (2021)
|
|
BASE
|
|
Show details
|
|
5 |
High-resolution speaker counting in reverberant rooms using CRNN with Ambisonics features
|
|
|
|
In: EUSIPCO 2020 - 28th European Signal Processing Conference (EUSIPCO) ; https://hal.archives-ouvertes.fr/hal-03537323 ; EUSIPCO 2020 - 28th European Signal Processing Conference (EUSIPCO), Jan 2021, Amsterdam, Netherlands. pp.71-75, ⟨10.23919/Eusipco47968.2020.9287637⟩ (2021)
|
|
Abstract:
International audience ; Speaker counting is the task of estimating the number of people that are simultaneously speaking in an audio recording. For several audio processing tasks such as speaker diarization, separation, localization and tracking, knowing the number of speakers at each timestep is a prerequisite, or at least it can be a strong advantage, in addition to enabling a low latency processing. For that purpose, we address the speaker counting problem with a multichannel convolutional recurrent neural network which produces an estimation at a short-term frame resolution. We trained the network to predict up to 5 concurrent speakers in a multichannel mixture, with simulated data including many different conditions in terms of source and microphone positions, reverberation, and noise. The network can predict the number of speakers with good accuracy at frame resolution.
|
|
Keyword:
[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]; [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR]; [INFO.INFO-NE]Computer Science [cs]/Neural and Evolutionary Computing [cs.NE]; [INFO.INFO-SD]Computer Science [cs]/Sound [cs.SD]; CRNN; reverberation; Speaker counting
|
|
URL: https://doi.org/10.23919/Eusipco47968.2020.9287637 https://hal.archives-ouvertes.fr/hal-03537323 https://hal.archives-ouvertes.fr/hal-03537323/file/eusipco2020.pdf https://hal.archives-ouvertes.fr/hal-03537323/document
|
|
BASE
|
|
Hide details
|
|
6 |
Automatic Speech Recognition systems errors for accident-prone sleepiness detection through voice
|
|
|
|
In: EUSIPCO 2021 ; https://hal.archives-ouvertes.fr/hal-03324033 ; EUSIPCO 2021, Aug 2021, Dublin (en ligne), Ireland. ⟨10.23919/EUSIPCO54536.2021.9616299⟩ (2021)
|
|
BASE
|
|
Show details
|
|
7 |
Automatic Speech Recognition systems errors for objective sleepiness detection through voice
|
|
|
|
In: Proceedings Interspeech 2021 ; Interspeech 2021 ; https://hal.archives-ouvertes.fr/hal-03328827 ; Interspeech 2021, Aug 2021, Brno (virtual), Czech Republic. pp.2476-2480, ⟨10.21437/Interspeech.2021-291⟩ (2021)
|
|
BASE
|
|
Show details
|
|
8 |
Speaker Attentive Speech Emotion Recognition
|
|
|
|
In: Proccedings of interspeech 2021 ; Interspeech 2021 ; https://hal.archives-ouvertes.fr/hal-03554368 ; Interspeech 2021, Aug 2021, Brno, Czech Republic. pp.2866-2870, ⟨10.21437/interspeech.2021-573⟩ (2021)
|
|
BASE
|
|
Show details
|
|
9 |
Prosodic Boundary Prediction Model for Vietnamese Text-To-Speech
|
|
|
|
In: Proc. Interspeech 2021 ; Interspeech 2021 ; https://hal.archives-ouvertes.fr/hal-03329116 ; Interspeech 2021, Aug 2021, Brno, Czech Republic. pp.3885-3889, ⟨10.21437/interspeech.2021-125⟩ (2021)
|
|
BASE
|
|
Show details
|
|
10 |
Learning robust speech representation with an articulatory-regularized variational autoencoder
|
|
|
|
In: Proccedings of Interspeech 2021 ; Interspeech 2021 - 22nd Annual Conference of the International Speech Communication Association ; https://hal.archives-ouvertes.fr/hal-03373252 ; Interspeech 2021 - 22nd Annual Conference of the International Speech Communication Association, Aug 2021, Brno, Czech Republic (2021)
|
|
BASE
|
|
Show details
|
|
11 |
Speech Resynthesis from Discrete Disentangled Self-Supervised Representations
|
|
|
|
In: INTERSPEECH 2021 - Annual Conference of the International Speech Communication Association ; https://hal.inria.fr/hal-03329245 ; INTERSPEECH 2021 - Annual Conference of the International Speech Communication Association, Aug 2021, Brno, Czech Republic (2021)
|
|
BASE
|
|
Show details
|
|
12 |
Learning spectro-temporal representations of complex sounds with parameterized neural networks
|
|
|
|
In: ISSN: 0001-4966 ; EISSN: 1520-8524 ; Journal of the Acoustical Society of America ; https://hal.inria.fr/hal-03329261 ; Journal of the Acoustical Society of America, Acoustical Society of America, 2021, 150 (1), pp.353-366. ⟨10.1121/10.0005482⟩ (2021)
|
|
BASE
|
|
Show details
|
|
13 |
Large vocabulary automatic speech recognition: from hybrid to end-to-end approaches ; Reconnaissance automatique de la parole à large vocabulaire : des approches hybrides aux approches End-to-End
|
|
|
|
In: https://hal.archives-ouvertes.fr/tel-03269807 ; Son [cs.SD]. Université toulouse 3 Paul Sabatier, 2021. Français (2021)
|
|
BASE
|
|
Show details
|
|
14 |
Leveraging lyrics from audio for MIR ; Exploiter les paroles de chansons à partir de l'audio pour le MIR
|
|
|
|
In: https://tel.archives-ouvertes.fr/tel-03558515 ; Signal and Image processing. Institut Polytechnique de Paris, 2021. English. ⟨NNT : 2021IPPAT027⟩ (2021)
|
|
BASE
|
|
Show details
|
|
15 |
Prosodic Disambiguation Using Chironomic Stylization of Intonation with Native and Non-Native Speakers
|
|
|
|
In: Proceedings Interspeech 2021 ; Interspeech 2021 ; https://hal.archives-ouvertes.fr/hal-03329111 ; Interspeech 2021, Aug 2021, Brno (virtual), Czech Republic. pp.516-520, ⟨10.21437/Interspeech.2021-182⟩ (2021)
|
|
BASE
|
|
Show details
|
|
16 |
Non-Parametric Bayesian Subspace Models for Acoustic Unit Discovery
|
|
|
|
In: https://hal.archives-ouvertes.fr/hal-03467205 ; 2021 (2021)
|
|
BASE
|
|
Show details
|
|
17 |
Vocal drum sounds in human beatboxing: An acoustic and articulatory exploration using electromagnetic articulography
|
|
|
|
In: ISSN: 0001-4966 ; EISSN: 1520-8524 ; Journal of the Acoustical Society of America ; https://hal.univ-grenoble-alpes.fr/hal-03107358 ; Journal of the Acoustical Society of America, Acoustical Society of America, 2021, 149 (1), pp.191-206. ⟨10.1121/10.0002921⟩ ; https://asa.scitation.org/doi/full/10.1121/10.0002921 (2021)
|
|
BASE
|
|
Show details
|
|
18 |
Perceptual equivalence of the Liljencrants-Fant and linear-filter glottal flow models
|
|
|
|
In: ISSN: 0001-4966 ; EISSN: 1520-8524 ; Journal of the Acoustical Society of America ; https://hal.archives-ouvertes.fr/hal-03322875 ; Journal of the Acoustical Society of America, Acoustical Society of America, 2021, 150 (2), pp.1273-1285. ⟨10.1121/10.0005879⟩ ; https://doi.org/10.1121/10.0005879 (2021)
|
|
BASE
|
|
Show details
|
|
19 |
End-to-End Speech Emotion Recognition: Challenges of Real-Life Emergency Call Centers Data Recordings
|
|
|
|
In: ISBN: 978-1-6654-0019-0 ; 2021 9th International Conference on Affective Computing and Intelligent Interaction (ACII) ; https://hal.archives-ouvertes.fr/hal-03405970 ; 2021 9th International Conference on Affective Computing and Intelligent Interaction (ACII), Sep 2021, Nara, Japan ; https://www.acii-conf.net/2021/ (2021)
|
|
BASE
|
|
Show details
|
|
20 |
Automated audio captioning by fine-tuning bart with audioset tags
|
|
|
|
In: DCASE 2021 - 6th Workshop on Detection and Classification of Acoustic Scenes and Events ; https://hal.inria.fr/hal-03522488 ; DCASE 2021 - 6th Workshop on Detection and Classification of Acoustic Scenes and Events, Nov 2021, Virtual, Spain (2021)
|
|
BASE
|
|
Show details
|
|
|
|