1 |
Multistream neural architectures for cued-speech recognition using a pre-trained visual feature extractor and constrained CTC decoding
|
|
|
|
In: ICASSP 2022 - IEEE International Conference on Acoustics, Speech and Signal Processing ; https://hal.archives-ouvertes.fr/hal-03578503 ; ICASSP 2022 - IEEE International Conference on Acoustics, Speech and Signal Processing, May 2022, Singapour, Singapore (2022)
|
|
BASE
|
|
Show details
|
|
2 |
An Overview of Indian Spoken Language Recognition from Machine Learning Perspective
|
|
|
|
In: ISSN: 2375-4699 ; EISSN: 2375-4702 ; ACM Transactions on Asian and Low-Resource Language Information Processing ; https://hal.inria.fr/hal-03616853 ; ACM Transactions on Asian and Low-Resource Language Information Processing, ACM, In press, ⟨10.1145/3523179⟩ (2022)
|
|
BASE
|
|
Show details
|
|
3 |
Differentially private speaker anonymization
|
|
|
|
In: https://hal.inria.fr/hal-03588932 ; 2022 (2022)
|
|
BASE
|
|
Show details
|
|
4 |
A Study of F0 Modification for X-Vector Based Speech Pseudonymization Across Gender
|
|
|
|
In: PPAI 2021 - The Second AAAI Workshop on Privacy-Preserving Artificial Intelligence ; https://hal.archives-ouvertes.fr/hal-02995862 ; PPAI 2021 - The Second AAAI Workshop on Privacy-Preserving Artificial Intelligence, Feb 2021, Virtual, China (2021)
|
|
BASE
|
|
Show details
|
|
5 |
Assessment of adult speech disorders: current situation and needs in French-speaking clinical practice
|
|
|
|
In: ISSN: 1401-5439 ; Logopedics Phoniatrics Vocology ; https://hal.archives-ouvertes.fr/hal-03120115 ; Logopedics Phoniatrics Vocology, Taylor & Francis, 2021, pp.1-15. ⟨10.1080/14015439.2020.1870245⟩ (2021)
|
|
BASE
|
|
Show details
|
|
6 |
Utterance partitioning for speaker recognition: an experimental review and analysis with new findings under GMM-SVM framework
|
|
|
|
In: ISSN: 1381-2416 ; EISSN: 1572-8110 ; International Journal of Speech Technology ; https://hal.archives-ouvertes.fr/hal-03232723 ; International Journal of Speech Technology, Springer Verlag, In press, ⟨10.1007/s10772-021-09862-8⟩ (2021)
|
|
BASE
|
|
Show details
|
|
7 |
Alternate Endings: Improving Prosody for Incremental Neural TTS with Predicted Future Text Input
|
|
|
|
In: Interspeech 2021 - 22nd Annual Conference of the International Speech Communication Association ; https://hal.archives-ouvertes.fr/hal-03372802 ; Interspeech 2021 - 22nd Annual Conference of the International Speech Communication Association, Aug 2021, Brno, Czech Republic. pp.3865-3869, ⟨10.21437/Interspeech.2021-275⟩ (2021)
|
|
BASE
|
|
Show details
|
|
8 |
Speaker Attentive Speech Emotion Recognition
|
|
|
|
In: Proccedings of interspeech 2021 ; Interspeech 2021 ; https://hal.archives-ouvertes.fr/hal-03554368 ; Interspeech 2021, Aug 2021, Brno, Czech Republic. pp.2866-2870, ⟨10.21437/interspeech.2021-573⟩ (2021)
|
|
BASE
|
|
Show details
|
|
9 |
EVOLEX : la reconnaissance vocale au service du diagnostic des dysfonctionnements langagiers
|
|
|
|
In: Séminaire AFCP 2021 – Phonétique Clinique ; https://hal-univ-tlse3.archives-ouvertes.fr/hal-03269242 ; Séminaire AFCP 2021 – Phonétique Clinique, May 2021, Toulouse (virtuel), France ; http://www.afcp-parole.org/seminaire-afcp-phonetique-clinique-27-mai-2021/ (2021)
|
|
BASE
|
|
Show details
|
|
10 |
Recognizing lexical units in low-resource language contexts with supervised and unsupervised neural networks
|
|
|
|
In: https://hal.archives-ouvertes.fr/hal-03429051 ; [Research Report] LACITO (UMR 7107). 2021 (2021)
|
|
BASE
|
|
Show details
|
|
11 |
Automatic extraction of speech rhythm descriptors for speech intelligibility assessment in the context of Head and Neck Cancers
|
|
|
|
In: à paraître ; INTERSPEECH 2021 ; https://hal-univ-tlse3.archives-ouvertes.fr/hal-03269227 ; INTERSPEECH 2021, ISCA : International Speech and Communication Association, Aug 2021, Brno, Czech Republic ; https://www.interspeech2021.org (2021)
|
|
Abstract:
International audience ; The temporal dimension of speech acoustics is rarely taken into account in automatic models for Speech Intelligibility evaluation, although the rhythmic recurrence of phonemes, syllables and prosodic groups are allegedly good predictors of speech intelligibility. The present study aims at unravelling those automatic parameters that best account for the different levels of the speech signal's rhythmic structure, and to evaluate their correlation with a perceptual intelligibility measure. The parameters are extracted from the Fourier Transform of the amplitude modulation of the signal (Envelope Modulation Spectrum) [1, 2]. A Lasso linear model for feature selection is first implemented to select the most relevant parameters, and a SVR regression analysis is run to reveal the best parameters' combination. Our analyses of EMS, using data from the French corpora of cancer speech C2SI [3], show strong performances of the automatic prediction, with a correlation of 0.70 between our model and an intelligibility evaluation score by speech-pathologists. In particular, the highest correlation with speech intelligibility lies in the ratio between the energy in the low frequency band (0.5-4 Hz that represents slow rhythmic modulations indicative of prosodic groups) and in the higher one (4-10 Hz that represents fast rhythmic modulations like phonemes).
|
|
Keyword:
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG]; [INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing; Automatic Speech Processing; pathological speech; perceptual speech intelligibility; speech rhythm modeling
|
|
URL: https://hal-univ-tlse3.archives-ouvertes.fr/hal-03269227/file/Interspeech2021_1736_Paper.pdf https://hal-univ-tlse3.archives-ouvertes.fr/hal-03269227 https://hal-univ-tlse3.archives-ouvertes.fr/hal-03269227/document
|
|
BASE
|
|
Hide details
|
|
12 |
Prosodic Boundary Prediction Model for Vietnamese Text-To-Speech
|
|
|
|
In: Proc. Interspeech 2021 ; Interspeech 2021 ; https://hal.archives-ouvertes.fr/hal-03329116 ; Interspeech 2021, Aug 2021, Brno, Czech Republic. pp.3885-3889, ⟨10.21437/interspeech.2021-125⟩ (2021)
|
|
BASE
|
|
Show details
|
|
13 |
“Motherese” Prosody in Fetal-Directed Speech: An Exploratory Study Using Automatic Social Signal Processing
|
|
|
|
In: ISSN: 1664-1078 ; Frontiers in Psychology ; https://hal.sorbonne-universite.fr/hal-03184038 ; Frontiers in Psychology, Frontiers, 2021, 12, ⟨10.3389/fpsyg.2021.646170⟩ (2021)
|
|
BASE
|
|
Show details
|
|
14 |
Learning spectro-temporal representations of complex sounds with parameterized neural networks
|
|
|
|
In: ISSN: 0001-4966 ; EISSN: 1520-8524 ; Journal of the Acoustical Society of America ; https://hal.inria.fr/hal-03329261 ; Journal of the Acoustical Society of America, Acoustical Society of America, 2021, 150 (1), pp.353-366. ⟨10.1121/10.0005482⟩ (2021)
|
|
BASE
|
|
Show details
|
|
15 |
Modeling the effect of military oxygen masks on speech characteristics
|
|
|
|
In: Interspeech 2021 ; https://hal.archives-ouvertes.fr/hal-03325087 ; Interspeech 2021, Aug 2021, Brno, Czech Republic (2021)
|
|
BASE
|
|
Show details
|
|
16 |
MRI Vocal Tract Sagittal Slices Estimation during Speech Production of CV
|
|
|
|
In: EUSIPCO 2020 - 28th European Signal Processing Conference ; https://hal.inria.fr/hal-03090824 ; EUSIPCO 2020 - 28th European Signal Processing Conference, Jan 2021, Amsterdam / Virtual, Netherlands ; https://eusipco2020.org/ (2021)
|
|
BASE
|
|
Show details
|
|
17 |
Construction of an automatic score for the evaluation of speech disorders among patients treated for a cancer of the oral cavity or the oropharynx: The Carcinologic Speech Severity Index
|
|
|
|
In: ISSN: 1043-3074 ; EISSN: 1097-0347 ; Head and Neck ; https://hal-univ-tlse3.archives-ouvertes.fr/hal-03413678 ; Head and Neck, Wiley, In press, ⟨10.1002/hed.26903⟩ (2021)
|
|
BASE
|
|
Show details
|
|
18 |
Automated Assessment of Glottal Dysfunction Through Unified Acoustic Voice Analysis
|
|
|
|
In: ISSN: 0892-1997 ; Journal of Voice ; https://hal.archives-ouvertes.fr/hal-02987882 ; Journal of Voice, Elsevier, In press, ⟨10.1016/j.jvoice.2020.08.032⟩ (2021)
|
|
BASE
|
|
Show details
|
|
19 |
Leveraging lyrics from audio for MIR ; Exploiter les paroles de chansons à partir de l'audio pour le MIR
|
|
|
|
In: https://tel.archives-ouvertes.fr/tel-03558515 ; Signal and Image processing. Institut Polytechnique de Paris, 2021. English. ⟨NNT : 2021IPPAT027⟩ (2021)
|
|
BASE
|
|
Show details
|
|
20 |
On the effect of normalization layers on Differentially Private training of deep Neural networks
|
|
|
|
In: https://hal.inria.fr/hal-03475600 ; 2021 (2021)
|
|
BASE
|
|
Show details
|
|
|
|