1 |
Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization ...
|
|
|
|
BASE
|
|
Show details
|
|
2 |
Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Using heterogeneity in semi-supervised transcription hypotheses to improve code-switched speech recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Continual Learning for Monolingual End-to-End Automatic Speech Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Assessing Evaluation Metrics for Speech-to-Speech Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
What do End-to-End Speech Models Learn about Speaker, Language and Channel Information? A Layer-wise and Neuron-level Analysis ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Oriental Language Recognition (OLR) 2020: Summary and Analysis ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Cetacean Translation Initiative: a roadmap to deciphering the communication of sperm whales ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Multilingual and crosslingual speech recognition using phonological-vector based phone embeddings ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Do Acoustic Word Embeddings Capture Phonological Similarity? An Empirical Study ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
Speech Representations and Phoneme Classification for Preserving the Endangered Language of Ladin ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Applying Phonological Features in Multilingual Text-To-Speech ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
English Accent Accuracy Analysis in a State-of-the-Art Automatic Speech Recognition System ...
|
|
|
|
BASE
|
|
Show details
|
|
15 |
Cross-lingual Low Resource Speaker Adaptation Using Phonological Features ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis ...
|
|
|
|
BASE
|
|
Show details
|
|
17 |
Synchronising speech segments with musical beats in Mandarin and English singing ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Arabic Speech Recognition by End-to-End, Modular Systems and Human ...
|
|
|
|
Abstract:
Recent advances in automatic speech recognition (ASR) have achieved accuracy levels comparable to human transcribers, which led researchers to debate if the machine has reached human performance. Previous work focused on the English language and modular hidden Markov model-deep neural network (HMM-DNN) systems. In this paper, we perform a comprehensive benchmarking for end-to-end transformer ASR, modular HMM-DNN ASR, and human speech recognition (HSR) on the Arabic language and its dialects. For the HSR, we evaluate linguist performance and lay-native speaker performance on a new dataset collected as a part of this study. For ASR the end-to-end work led to 12.5%, 27.5%, 33.8% WER; a new performance milestone for the MGB2, MGB3, and MGB5 challenges respectively. Our results suggest that human performance in the Arabic language is still considerably better than the machine with an absolute WER gap of 3.5% on average. ...
|
|
Keyword:
Audio and Speech Processing eess.AS; Computation and Language cs.CL; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering; Machine Learning cs.LG; Sound cs.SD
|
|
URL: https://arxiv.org/abs/2101.08454 https://dx.doi.org/10.48550/arxiv.2101.08454
|
|
BASE
|
|
Hide details
|
|
19 |
The INTERSPEECH 2021 Computational Paralinguistics Challenge: COVID-19 Cough, COVID-19 Speech, Escalation & Primates ...
|
|
|
|
BASE
|
|
Show details
|
|
20 |
Preliminary study on using vector quantization latent spaces for TTS/VC systems with consistent performance ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|