Catalogue search • Linguistik portal • Fachinformationsdienst (FID)

1	The NTNU Taiwanese ASR System for Formosa Speech Recognition Challenge 2020 ...
	Chao, Fu-An; Lo, Tien-Hong; Weng, Shi-Yan. - : arXiv, 2021
	BASE
	Show details

2	Effective Decoder Masking for Transformer Based End-to-End Speech Recognition ...
	Weng, Shi-Yan; Chen, Berlin. - : arXiv, 2020
	Abstract: The attention-based encoder-decoder modeling paradigm has achieved promising results on a variety of speech processing tasks like automatic speech recognition (ASR), text-to-speech (TTS) and among others. This paradigm takes advantage of the generalization ability of neural networks to learn a direct mapping from an input sequence to an output sequence, without recourse to prior knowledge such as audio-text alignments or pronunciation lexicons. However, ASR models stemming from this paradigm are prone to overfitting, especially when the training data is limited. Inspired by SpecAugment and BERT-like masked language modeling, we propose in the paper a decoder masking based training approach for end-to-end (E2E) ASR models. During the training phase we randomly replace some portions of the decoder's historical text input with the symbol [mask], in order to encourage the decoder to robustly output a correct token even when parts of its decoding history are masked or corrupted. The proposed approach is ... : More extensions and experiments are under exploration ...
	Keyword: Audio and Speech Processing eess.AS; FOS Electrical engineering, electronic engineering, information engineering
	URL: https://arxiv.org/abs/2010.14764 https://dx.doi.org/10.48550/arxiv.2010.14764
	BASE
	Hide details

Search in the Catalogues and Directories