Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Hits 1 – 5 of 5

1	Learning Filterbanks from Raw Speech for Phoneme Recognition
	Zeghidour, Neil; Usunier, Nicolas; Kokkinos, Iasonas...
	In: ICASSP 2018 - IEEE International Conference on Acoustics, Speech and Signal Processing ; https://hal.archives-ouvertes.fr/hal-01888737 ; ICASSP 2018 - IEEE International Conference on Acoustics, Speech and Signal Processing, Apr 2018, Calgary, Alberta, Canada (2018)
	BASE
	Show details

2	Sampling strategies in Siamese Networks for unsupervised speech representation learning
	Riad, Rachid; Dancette, Corentin; Karadayi, Julien...
	In: Interspeech 2018 ; https://hal.archives-ouvertes.fr/hal-01888725 ; Interspeech 2018, Sep 2018, Hyderabad, India (2018)
	BASE
	Show details

3	End-to-End Speech Recognition From the Raw Waveform
	Zeghidour, Neil; Usunier, Nicolas; Synnaeve, Gabriel; Collobert, Ronan; Dupoux, Emmanuel
	In: Interspeech 2018 ; https://hal.archives-ouvertes.fr/hal-01888739 ; Interspeech 2018, Sep 2018, Hyderabad, India. ⟨10.21437/Interspeech.2018-2414⟩ (2018)
	Abstract: Accepted for presentation at Interspeech 2018 ; International audience ; State-of-the-art speech recognition systems rely on fixed, hand-crafted features such as mel-filterbanks to preprocess the waveform before the training pipeline. In this paper, we study end-to-end systems trained directly from the raw waveform, building on two alternatives for trainable replacements of mel-filterbanks that use a convolutional architecture. The first one is inspired by gammatone filterbanks (Hoshen et al., 2015; Sainath et al, 2015), and the second one by the scattering transform (Zeghidour et al., 2017). We propose two modifications to these architectures and systematically compare them to mel-filterbanks, on the Wall Street Journal dataset. The first modification is the addition of an instance normalization layer, which greatly improves on the gammatone-based trainable filterbanks and speeds up the training of the scattering-based filterbanks. The second one relates to the low-pass filter used in these approaches. These modifications consistently improve performances for both approaches, and remove the need for a careful initialization in scattering-based trainable filterbanks. In particular, we show a consistent improvement in word error rate of the trainable filterbanks relatively to comparable mel-filterbanks. It is the first time end-to-end models trained from the raw signal significantly outperform mel-filterbanks on a large vocabulary task under clean recording conditions.
	Keyword: [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; [INFO.INFO-SD]Computer Science [cs]/Sound [cs.SD]; [SCCO.LING]Cognitive science/Linguistics; [SCCO]Cognitive science; deep; end-to-end; gammatones; Index Terms: speech recognition; scattering; Speech recognition; waveform
	URL: https://doi.org/10.21437/Interspeech.2018-2414 https://hal.archives-ouvertes.fr/hal-01888739 https://hal.archives-ouvertes.fr/hal-01888739/file/Zeghidour_USCD_2018_End2end_from_wav.Interspeech.pdf https://hal.archives-ouvertes.fr/hal-01888739/document
	BASE
	Hide details

4	SING: Symbol-to-Instrument Neural Generator
	Défossez, Alexandre; Zeghidour, Neil; Usunier, Nicolas...
	In: Conference on Neural Information Processing Systems (NIPS) ; https://hal.archives-ouvertes.fr/hal-01899949 ; Conference on Neural Information Processing Systems (NIPS), Dec 2018, Montréal, Canada (2018)
	BASE
	Show details

5	Learning Weakly Supervised Multimodal Phoneme Embeddings
	Chaabouni, Rahma; Dunbar, Ewan; Zeghidour, Neil...
	In: Interspeech 2017 ; https://hal.inria.fr/hal-01687415 ; Interspeech 2017, 2017, Stockholm, Sweden. ⟨10.21437/Interspeech.2017-1689⟩ (2017)
	BASE
	Show details

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern