DE eng

Search in the Catalogues and Directories

Hits 1 – 11 of 11

1
Self-supervised Learning with Random-projection Quantizer for Speech Recognition ...
BASE
Show details
2
Unsupervised Data Selection via Discrete Speech Representation for ASR ...
Lu, Zhiyun; Wang, Yongqiang; Zhang, Yu. - : arXiv, 2022
BASE
Show details
3
MAESTRO: Matched Speech Text Representations through Modality Matching ...
BASE
Show details
4
Joint Unsupervised and Supervised Training for Multilingual ASR ...
Abstract: Self-supervised training has shown promising gains in pretraining models and facilitating the downstream finetuning for speech recognition, like multilingual ASR. Most existing methods adopt a 2-stage scheme where the self-supervised loss is optimized in the first pretraining stage, and the standard supervised finetuning resumes in the second stage. In this paper, we propose an end-to-end (E2E) Joint Unsupervised and Supervised Training (JUST) method to combine the supervised RNN-T loss and the self-supervised contrastive and masked language modeling (MLM) losses. We validate its performance on the public dataset Multilingual LibriSpeech (MLS), which includes 8 languages and is extremely imbalanced. On MLS, we explore (1) JUST trained from scratch, and (2) JUST finetuned from a pretrained checkpoint. Experiments show that JUST can consistently outperform other existing state-of-the-art methods, and beat the monolingual baseline by a significant margin, demonstrating JUST's capability of handling low-resource ...
Keyword: Audio and Speech Processing eess.AS; Computation and Language cs.CL; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering; Machine Learning cs.LG; Sound cs.SD
URL: https://arxiv.org/abs/2111.08137
https://dx.doi.org/10.48550/arxiv.2111.08137
BASE
Hide details
5
Scaling End-to-End Models for Large-Scale Multilingual ASR ...
BASE
Show details
6
Improving Confidence Estimation on Out-of-Domain Data for End-to-End Speech Recognition ...
Li, Qiujia; Zhang, Yu; Qiu, David. - : arXiv, 2021
BASE
Show details
7
Injecting Text in Self-Supervised Speech Pretraining ...
BASE
Show details
8
Large-scale multilingual audio visual dubbing ...
BASE
Show details
9
Speech Recognition with Augmented Synthesized Speech ...
BASE
Show details
10
Bytes are All You Need: End-to-End Multilingual Speech Recognition and Synthesis with Bytes ...
Li, Bo; Zhang, Yu; Sainath, Tara. - : arXiv, 2018
BASE
Show details
11
Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data ...
Hsu, Wei-Ning; Zhang, Yu; Glass, James. - : arXiv, 2017
BASE
Show details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
11
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern