DE eng

Search in the Catalogues and Directories

Hits 1 – 2 of 2

1
The NTNU Taiwanese ASR System for Formosa Speech Recognition Challenge 2020 ...
BASE
Show details
2
Effective Decoder Masking for Transformer Based End-to-End Speech Recognition ...
Weng, Shi-Yan; Chen, Berlin. - : arXiv, 2020
Abstract: The attention-based encoder-decoder modeling paradigm has achieved promising results on a variety of speech processing tasks like automatic speech recognition (ASR), text-to-speech (TTS) and among others. This paradigm takes advantage of the generalization ability of neural networks to learn a direct mapping from an input sequence to an output sequence, without recourse to prior knowledge such as audio-text alignments or pronunciation lexicons. However, ASR models stemming from this paradigm are prone to overfitting, especially when the training data is limited. Inspired by SpecAugment and BERT-like masked language modeling, we propose in the paper a decoder masking based training approach for end-to-end (E2E) ASR models. During the training phase we randomly replace some portions of the decoder's historical text input with the symbol [mask], in order to encourage the decoder to robustly output a correct token even when parts of its decoding history are masked or corrupted. The proposed approach is ... : More extensions and experiments are under exploration ...
Keyword: Audio and Speech Processing eess.AS; FOS Electrical engineering, electronic engineering, information engineering
URL: https://arxiv.org/abs/2010.14764
https://dx.doi.org/10.48550/arxiv.2010.14764
BASE
Hide details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
2
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern