1 |
Effective keyword search for low-resourced conversational speech
|
|
|
|
In: icassp 2017 ; https://hal.archives-ouvertes.fr/hal-01744176 ; icassp 2017, IEEE, Mar 2017, La Nouvelle Orléans, United States (2017)
|
|
BASE
|
|
Show details
|
|
2 |
Developing an Embosi (Bantu C25) Speech Variant Dictionary to Model Vowel Elision and Morpheme Deletion
|
|
|
|
In: Annual Conference of the International Speech Communication Association ; https://hal.archives-ouvertes.fr/hal-01837178 ; Annual Conference of the International Speech Communication Association , ISCA, Aug 2017, Stockholm, Sweden (2017)
|
|
BASE
|
|
Show details
|
|
3 |
Schwa Realization in French: Using Automatic Speech Processing to Study Phonological and Socio-linguistic Factors in Large Corpora
|
|
|
|
In: Annual Conference of the International Speech Communication Association ; https://hal.archives-ouvertes.fr/hal-01837179 ; Annual Conference of the International Speech Communication Association , ISCA, Aug 2017, Stockholm, Sweden (2017)
|
|
BASE
|
|
Show details
|
|
4 |
Addressing Code-Switching in French/Algerian Arabic Speech
|
|
|
|
In: Interspeech 2017 ; https://halshs.archives-ouvertes.fr/halshs-01969148 ; Interspeech 2017, Aug 2017, Stockholm, Sweden. pp.62-66, ⟨10.21437/interspeech.2017-1373⟩ (2017)
|
|
BASE
|
|
Show details
|
|
5 |
An investigation into language model data augmentation for low-resourced STT and KWS
|
|
|
|
In: IEEE International Conference on Acoustics, Speech, and Signal Processing ; https://hal.archives-ouvertes.fr/hal-01837171 ; IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE, Mar 2017, New Orleans, United States (2017)
|
|
BASE
|
|
Show details
|
|
6 |
Addressing Code-Switching in French/Algerian Arabic Speech
|
|
|
|
In: Annual Conference of the International Speech Communication Association ; https://hal.archives-ouvertes.fr/hal-01837206 ; Annual Conference of the International Speech Communication Association , ISCA, Aug 2017, Stockholm, Sweden (2017)
|
|
BASE
|
|
Show details
|
|
7 |
Corpus base linguistic exploration via forced alignments with a ‘light-weight’ ASR tool
|
|
|
|
In: Language & Technology Conference : Human Language Technologies as a Challenge for Computer Science and Linguistics ; https://hal.archives-ouvertes.fr/hal-01837174 ; Language & Technology Conference : Human Language Technologies as a Challenge for Computer Science and Linguistics, Nov 2017, Poznań, Poland (2017)
|
|
BASE
|
|
Show details
|
|
8 |
Discovering speech reductions across speaking styles and languages
|
|
|
|
In: Rethinking reduction: Interdisciplinary perspectives on conditions, mechanisms, and domains for phonetic variation ; https://halshs.archives-ouvertes.fr/halshs-01507312 ; Cangemi, F., Clayards M., Niebuhr O., Schuppler B., & Zellers M. Rethinking reduction: Interdisciplinary perspectives on conditions, mechanisms, and domains for phonetic variation, De Gruyter Mouton 2017 (2017)
|
|
BASE
|
|
Show details
|
|
9 |
Phonetic variation and contrast neutralization patterns in Romanian fricatives accross different speaking styles
|
|
|
|
In: Diversity and Speech Dynamics ; https://hal.archives-ouvertes.fr/hal-01837181 ; Diversity and Speech Dynamics, May 2017, Herrsching am Ammersee, Germany (2017)
|
|
BASE
|
|
Show details
|
|
10 |
Multimodal emotion recognition for AVEC 2016 challenge
|
|
Hradis, Michal; Smrz, Pavel; Wood, Ian; Robin, Cécile; Matejka, Pavel; Otrusina, Lubomir; Popkova, Anna; Lamel, Lori; Povolny, Filip. - : ACM, 2017
|
|
Abstract:
This paper describes a systems for emotion recognition and its application on the dataset from the AV+EC 2016 Emotion Recognition Challenge. The realized system was produced and submitted to the AV+EC 2016 evaluation, making use of all three modalities (audio, video, and physiological data). Our work primarily focused on features derived from audio. The original audio features were complement with bottleneck features and also text-based emotion recognition which is based on transcribing audio by an automatic speech recognition system and applying resources such as word embedding models and sentiment lexicons. Our multimodal fusion reached CCC=0.855 on dev set for arousal and 0.713 for valence. CCC on test set is 0.719 and 0.596 for arousal and valence respectively. ; 5. ACKNOWLEDGMENTS This work has been funded by the European Union’s Horizon 2020 programme under grant agreement No. 644632 MixedEmotions and No. 645523 BISON, and by Technology Agency of the Czech Republic project No. TA04011311 “MINT”. It was also supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Defense US Army Research Laboratory contract number W911NF- 12-C-0013. ; peer-reviewed
|
|
Keyword:
Arousal; Bottleneck features; Emotion recognition; Neural networks; Regression; Speech transcription; Valence; Word embedding
|
|
URL: https://doi.org/10.1145/2988257.2988268 http://hdl.handle.net/10379/7036
|
|
BASE
|
|
Hide details
|
|
|
|