1 |
Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization ...
|
|
Yan, Brian; Zhang, Chunlei; Yu, Meng; Zhang, Shi-Xiong; Dalmia, Siddharth; Berrebbi, Dan; Weng, Chao; Watanabe, Shinji; Yu, Dong. - : arXiv, 2021
|
|
Abstract:
Conversational bilingual speech encompasses three types of utterances: two purely monolingual types and one intra-sententially code-switched type. In this work, we propose a general framework to jointly model the likelihoods of the monolingual and code-switch sub-tasks that comprise bilingual speech recognition. By defining the monolingual sub-tasks with label-to-frame synchronization, our joint modeling framework can be conditionally factorized such that the final bilingual output, which may or may not be code-switched, is obtained given only monolingual information. We show that this conditionally factorized joint framework can be modeled by an end-to-end differentiable neural network. We demonstrate the efficacy of our proposed model on bilingual Mandarin-English speech recognition across both monolingual and code-switched corpora. ...
|
|
Keyword:
Audio and Speech Processing eess.AS; Computation and Language cs.CL; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering; Sound cs.SD
|
|
URL: https://dx.doi.org/10.48550/arxiv.2111.15016 https://arxiv.org/abs/2111.15016
|
|
BASE
|
|
Hide details
|
|
2 |
Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Self-Guided Curriculum Learning for Neural Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Arabic Speech Recognition by End-to-End, Modular Systems and Human ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yoloxóchitl Mixtec ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Leveraging Pre-trained Language Model for Speech Sentiment Analysis ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
End-to-end ASR to jointly predict transcriptions and linguistic annotations ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Differentiable Allophone Graphs for Language-Universal Speech Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Speech Representation Learning Combining Conformer CPC with Deep Cluster for the ZeroSpeech Challenge 2021 ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
CHiME-6 Challenge: Tackling multispeaker speech recognition for unsegmented recordings
|
|
|
|
In: CHiME 2020 - 6th International Workshop on Speech Processing in Everyday Environments ; https://hal.inria.fr/hal-02546993 ; CHiME 2020 - 6th International Workshop on Speech Processing in Everyday Environments, May 2020, Barcelona / Virtual, Spain (2020)
|
|
BASE
|
|
Show details
|
|
14 |
A Comparative Study on Transformer vs RNN in Speech Applications ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Towards Online End-to-end Transformer Automatic Speech Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
The fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, task and baselines
|
|
|
|
In: Interspeech 2018 - 19th Annual Conference of the International Speech Communication Association ; https://hal.inria.fr/hal-01744021 ; Interspeech 2018 - 19th Annual Conference of the International Speech Communication Association, Sep 2018, Hyderabad, India (2018)
|
|
BASE
|
|
Show details
|
|
19 |
Analysis of Multilingual Sequence-to-Sequence speech recognition systems ...
|
|
|
|
BASE
|
|
Show details
|
|
20 |
Language model integration based on memory control for sequence to sequence speech recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|