DE eng

Search in the Catalogues and Directories

Hits 1 – 5 of 5

1
Learning Filterbanks from Raw Speech for Phoneme Recognition
In: ICASSP 2018 - IEEE International Conference on Acoustics, Speech and Signal Processing ; https://hal.archives-ouvertes.fr/hal-01888737 ; ICASSP 2018 - IEEE International Conference on Acoustics, Speech and Signal Processing, Apr 2018, Calgary, Alberta, Canada (2018)
BASE
Show details
2
Sampling strategies in Siamese Networks for unsupervised speech representation learning
In: Interspeech 2018 ; https://hal.archives-ouvertes.fr/hal-01888725 ; Interspeech 2018, Sep 2018, Hyderabad, India (2018)
BASE
Show details
3
End-to-End Speech Recognition From the Raw Waveform
In: Interspeech 2018 ; https://hal.archives-ouvertes.fr/hal-01888739 ; Interspeech 2018, Sep 2018, Hyderabad, India. ⟨10.21437/Interspeech.2018-2414⟩ (2018)
Abstract: Accepted for presentation at Interspeech 2018 ; International audience ; State-of-the-art speech recognition systems rely on fixed, hand-crafted features such as mel-filterbanks to preprocess the waveform before the training pipeline. In this paper, we study end-to-end systems trained directly from the raw waveform, building on two alternatives for trainable replacements of mel-filterbanks that use a convolutional architecture. The first one is inspired by gammatone filterbanks (Hoshen et al., 2015; Sainath et al, 2015), and the second one by the scattering transform (Zeghidour et al., 2017). We propose two modifications to these architectures and systematically compare them to mel-filterbanks, on the Wall Street Journal dataset. The first modification is the addition of an instance normalization layer, which greatly improves on the gammatone-based trainable filterbanks and speeds up the training of the scattering-based filterbanks. The second one relates to the low-pass filter used in these approaches. These modifications consistently improve performances for both approaches, and remove the need for a careful initialization in scattering-based trainable filterbanks. In particular, we show a consistent improvement in word error rate of the trainable filterbanks relatively to comparable mel-filterbanks. It is the first time end-to-end models trained from the raw signal significantly outperform mel-filterbanks on a large vocabulary task under clean recording conditions.
Keyword: [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; [INFO.INFO-SD]Computer Science [cs]/Sound [cs.SD]; [SCCO.LING]Cognitive science/Linguistics; [SCCO]Cognitive science; deep; end-to-end; gammatones; Index Terms: speech recognition; scattering; Speech recognition; waveform
URL: https://doi.org/10.21437/Interspeech.2018-2414
https://hal.archives-ouvertes.fr/hal-01888739
https://hal.archives-ouvertes.fr/hal-01888739/file/Zeghidour_USCD_2018_End2end_from_wav.Interspeech.pdf
https://hal.archives-ouvertes.fr/hal-01888739/document
BASE
Hide details
4
SING: Symbol-to-Instrument Neural Generator
In: Conference on Neural Information Processing Systems (NIPS) ; https://hal.archives-ouvertes.fr/hal-01899949 ; Conference on Neural Information Processing Systems (NIPS), Dec 2018, Montréal, Canada (2018)
BASE
Show details
5
Learning Weakly Supervised Multimodal Phoneme Embeddings
In: Interspeech 2017 ; https://hal.inria.fr/hal-01687415 ; Interspeech 2017, 2017, Stockholm, Sweden. ⟨10.21437/Interspeech.2017-1689⟩ (2017)
BASE
Show details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
5
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern