Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Hits 1 – 11 of 11

1	Self-supervised Learning with Random-projection Quantizer for Speech Recognition ...
	Chiu, Chung-Cheng; Qin, James; Zhang, Yu. - : arXiv, 2022
	BASE
	Show details

2	Unsupervised Data Selection via Discrete Speech Representation for ASR ...
	Lu, Zhiyun; Wang, Yongqiang; Zhang, Yu. - : arXiv, 2022
	BASE
	Show details

3	MAESTRO: Matched Speech Text Representations through Modality Matching ...
	Chen, Zhehuai; Zhang, Yu; Rosenberg, Andrew. - : arXiv, 2022
	BASE
	Show details

4	Joint Unsupervised and Supervised Training for Multilingual ASR ...
	Bai, Junwen; Li, Bo; Zhang, Yu; Bapna, Ankur; Siddhartha, Nikhil; Sim, Khe Chai; Sainath, Tara N.. - : arXiv, 2021
	Abstract: Self-supervised training has shown promising gains in pretraining models and facilitating the downstream finetuning for speech recognition, like multilingual ASR. Most existing methods adopt a 2-stage scheme where the self-supervised loss is optimized in the first pretraining stage, and the standard supervised finetuning resumes in the second stage. In this paper, we propose an end-to-end (E2E) Joint Unsupervised and Supervised Training (JUST) method to combine the supervised RNN-T loss and the self-supervised contrastive and masked language modeling (MLM) losses. We validate its performance on the public dataset Multilingual LibriSpeech (MLS), which includes 8 languages and is extremely imbalanced. On MLS, we explore (1) JUST trained from scratch, and (2) JUST finetuned from a pretrained checkpoint. Experiments show that JUST can consistently outperform other existing state-of-the-art methods, and beat the monolingual baseline by a significant margin, demonstrating JUST's capability of handling low-resource ...
	Keyword: Audio and Speech Processing eess.AS; Computation and Language cs.CL; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering; Machine Learning cs.LG; Sound cs.SD
	URL: https://arxiv.org/abs/2111.08137 https://dx.doi.org/10.48550/arxiv.2111.08137
	BASE
	Hide details

5	Scaling End-to-End Models for Large-Scale Multilingual ASR ...
	Li, Bo; Pang, Ruoming; Sainath, Tara N.. - : arXiv, 2021
	BASE
	Show details

6	Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition ...
	Li, Qiujia; Zhang, Yu; Qiu, David. - : arXiv, 2021
	BASE
	Show details

7	Injecting Text in Self-Supervised Speech Pretraining ...
	Chen, Zhehuai; Zhang, Yu; Rosenberg, Andrew. - : arXiv, 2021
	BASE
	Show details

8	Large-scale multilingual audio visual dubbing ...
	Yang, Yi; Shillingford, Brendan; Assael, Yannis. - : arXiv, 2020
	BASE
	Show details

9	Speech Recognition with Augmented Synthesized Speech ...
	Rosenberg, Andrew; Zhang, Yu; Ramabhadran, Bhuvana. - : arXiv, 2019
	BASE
	Show details

10	Bytes are All You Need: End-to-End Multilingual Speech Recognition and Synthesis with Bytes ...
	Li, Bo; Zhang, Yu; Sainath, Tara. - : arXiv, 2018
	BASE
	Show details

11	Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data ...
	Hsu, Wei-Ning; Zhang, Yu; Glass, James. - : arXiv, 2017
	BASE
	Show details

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern