Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Page: 1 2 3 4 5...8

Hits 1 – 20 of 155

1	Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization ...
	Yan, Brian; Zhang, Chunlei; Yu, Meng. - : arXiv, 2021
	BASE
	Show details

2	Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation ...
	Inaguma, Hirofumi; Kawahara, Tatsuya; Watanabe, Shinji. - : arXiv, 2021
	BASE
	Show details

3	Using heterogeneity in semi-supervised transcription hypotheses to improve code-switched speech recognition ...
	Slottje, Andrew; Wotherspoon, Shannon; Hartmann, William. - : arXiv, 2021
	BASE
	Show details

4	Continual Learning for Monolingual End-to-End Automatic Speech Recognition ...
	Eeckt, Steven Vander; Van hamme, Hugo. - : arXiv, 2021
	BASE
	Show details

5	Assessing Evaluation Metrics for Speech-to-Speech Translation ...
	Salesky, Elizabeth; Mäder, Julian; Klinger, Severin. - : arXiv, 2021
	BASE
	Show details

6	What do End-to-End Speech Models Learn about Speaker, Language and Channel Information? A Layer-wise and Neuron-level Analysis ...
	Chowdhury, Shammur Absar; Durrani, Nadir; Ali, Ahmed. - : arXiv, 2021
	BASE
	Show details

7	Integrating Categorical Features in End-to-End ASR ...
	Huang, Rongqing. - : arXiv, 2021
	BASE
	Show details

8	Oriental Language Recognition (OLR) 2020: Summary and Analysis ...
	Li, Jing; Wang, Binling; Zhi, Yiming. - : arXiv, 2021
	BASE
	Show details

9	Cetacean Translation Initiative: a roadmap to deciphering the communication of sperm whales ...
	Andreas, Jacob; Beguš, Gašper; Bronstein, Michael M.. - : arXiv, 2021
	BASE
	Show details

10	Multilingual and crosslingual speech recognition using phonological-vector based phone embeddings ...
	Zhu, Chengrui; An, Keyu; Zheng, Huahuan. - : arXiv, 2021
	BASE
	Show details

11	Do Acoustic Word Embeddings Capture Phonological Similarity? An Empirical Study ...
	Abdullah, Badr M.; Mosbach, Marius; Zaitova, Iuliia. - : arXiv, 2021
	BASE
	Show details

12	Speech Representations and Phoneme Classification for Preserving the Endangered Language of Ladin ...
	Durante, Zane; Mathur, Leena; Ye, Eric. - : arXiv, 2021
	BASE
	Show details

13	Applying Phonological Features in Multilingual Text-To-Speech ...
	Zhang, Cong; Zeng, Huinan; Liu, Huang. - : arXiv, 2021
	BASE
	Show details

14	English Accent Accuracy Analysis in a State-of-the-Art Automatic Speech Recognition System ...
	Cámbara, Guillermo; Peiró-Lilja, Alex; Farrús, Mireia. - : arXiv, 2021
	BASE
	Show details

15	Cross-lingual Low Resource Speaker Adaptation Using Phonological Features ...
	Maniati, Georgia; Ellinas, Nikolaos; Markopoulos, Konstantinos. - : arXiv, 2021
	BASE
	Show details

16	Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis ...
	Zhou, Yixuan; Song, Changhe; Li, Jingbei. - : arXiv, 2021
	BASE
	Show details

17	Synchronising speech segments with musical beats in Mandarin and English singing ...
	Zhang, Cong; Zhu, Jian. - : arXiv, 2021
	BASE
	Show details

18	Arabic Speech Recognition by End-to-End, Modular Systems and Human ...
	Hussein, Amir; Watanabe, Shinji; Ali, Ahmed. - : arXiv, 2021
	Abstract: Recent advances in automatic speech recognition (ASR) have achieved accuracy levels comparable to human transcribers, which led researchers to debate if the machine has reached human performance. Previous work focused on the English language and modular hidden Markov model-deep neural network (HMM-DNN) systems. In this paper, we perform a comprehensive benchmarking for end-to-end transformer ASR, modular HMM-DNN ASR, and human speech recognition (HSR) on the Arabic language and its dialects. For the HSR, we evaluate linguist performance and lay-native speaker performance on a new dataset collected as a part of this study. For ASR the end-to-end work led to 12.5%, 27.5%, 33.8% WER; a new performance milestone for the MGB2, MGB3, and MGB5 challenges respectively. Our results suggest that human performance in the Arabic language is still considerably better than the machine with an absolute WER gap of 3.5% on average. ...
	Keyword: Audio and Speech Processing eess.AS; Computation and Language cs.CL; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering; Machine Learning cs.LG; Sound cs.SD
	URL: https://arxiv.org/abs/2101.08454 https://dx.doi.org/10.48550/arxiv.2101.08454
	BASE
	Hide details

19	The INTERSPEECH 2021 Computational Paralinguistics Challenge: COVID-19 Cough, COVID-19 Speech, Escalation & Primates ...
	Schuller, Björn W.; Batliner, Anton; Bergler, Christian. - : arXiv, 2021
	BASE
	Show details

20	Preliminary study on using vector quantization latent spaces for TTS/VC systems with consistent performance ...
	Luong, Hieu-Thi; Yamagishi, Junichi. - : arXiv, 2021
	BASE
	Show details

Page: 1 2 3 4 5...8

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern