Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Hits 1 – 4 of 4

1	LIBRI-LIGHT: a benchmark for asr with limited or no supervision
	Kahn, Jacob; Rivière, Morgane; Zheng, Weiyi...
	In: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing ; https://hal.archives-ouvertes.fr/hal-02959460 ; ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, May 2020, Barcelona / Virtual, Spain. pp.7669-7673, ⟨10.1109/ICASSP40776.2020.9052942⟩ (2020)
	BASE
	Show details

2	Data Augmenting Contrastive Learning of Speech Representations in the Time Domain
	Kharitonov, Eugene; Rivière, Morgane; Synnaeve, Gabriel; Wolf, Lior; Mazaré, Pierre-Emmanuel; Douze, Matthijs; Dupoux, Emmanuel
	In: SLT 2020 - IEEE Spoken Language Technology Workshop ; https://hal.archives-ouvertes.fr/hal-03070321 ; SLT 2020 - IEEE Spoken Language Technology Workshop, Dec 2020, Shenzhen / Virtual, China (2020)
	Abstract: International audience ; Contrastive Predictive Coding (CPC), based on predicting future segments of speech based on past segments is emerging as a powerful algorithm for representation learning of speech signal. However, it still under-performs other methods on unsupervised evaluation benchmarks. Here, we introduce WavAugment, a time-domain data augmentation library and find that applying augmentation in the past is generally more efficient and yields better performances than other methods. We find that a combination of pitch modification, additive noise and reverberation substantially increase the performance of CPC (relative improvement of 18-22%), beating the reference Libri-light results with 600 times less data. Using an out-of-domain dataset, time-domain data augmentation can push CPC to be on par with the state of the art on the Zero Speech Benchmark 2017. We also show that time-domain data augmentation consistently improves downstream limited-supervision phoneme classification tasks by a factor of 12-15% relative.
	Keyword: [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; [INFO.INFO-SD]Computer Science [cs]/Sound [cs.SD]; Contrastive predictive coding; Data augmentation; Speech recognition; Unsupervised representation learning
	URL: https://hal.archives-ouvertes.fr/hal-03070321 https://hal.archives-ouvertes.fr/hal-03070321/file/2007.00991.pdf https://hal.archives-ouvertes.fr/hal-03070321/document
	BASE
	Hide details

3	Unsupervised pretraining transfers well across languages
	Rivière, Morgane; Joulin, Armand; Mazaré, Pierre-Emmanuel...
	In: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing ; https://hal.archives-ouvertes.fr/hal-02959418 ; ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, May 2020, Barcelona / Virtual, Spain. pp.7414-7418, ⟨10.1109/ICASSP40776.2020.9054548⟩ (2020)
	BASE
	Show details

4	Unsupervised pretraining transfers well across languages ...
	Rivière, Morgane; Joulin, Armand; Mazaré, Pierre-Emmanuel. - : arXiv, 2020
	BASE
	Show details

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern