Page: 1 2 3 4 5 6 7 8... 567
62 |
Hierarchical Softmax for End-to-End Low-resource Multilingual Speech Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
63 |
Cross-lingual Self-Supervised Speech Representations for Improved Dysarthric Speech Recognition ...
|
|
|
|
Abstract:
State-of-the-art automatic speech recognition (ASR) systems perform well on healthy speech. However, the performance on impaired speech still remains an issue. The current study explores the usefulness of using Wav2Vec self-supervised speech representations as features for training an ASR system for dysarthric speech. Dysarthric speech recognition is particularly difficult as several aspects of speech such as articulation, prosody and phonation can be impaired. Specifically, we train an acoustic model with features extracted from Wav2Vec, Hubert, and the cross-lingual XLSR model. Results suggest that speech representations pretrained on large unlabelled data can improve word error rate (WER) performance. In particular, features from the multilingual model led to lower WERs than filterbanks (Fbank) or models trained on a single language. Improvements were observed in English speakers with cerebral palsy caused dysarthria (UASpeech corpus), Spanish speakers with Parkinsonian dysarthria (PC-GITA corpus) and ... : Submitted for review at Interspeech 2022 ...
|
|
Keyword:
Audio and Speech Processing eess.AS; Computation and Language cs.CL; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering; Sound cs.SD
|
|
URL: https://dx.doi.org/10.48550/arxiv.2204.01670 https://arxiv.org/abs/2204.01670
|
|
BASE
|
|
Hide details
|
|
65 |
Code-Switching Text Augmentation for Multilingual Speech Processing ...
|
|
|
|
BASE
|
|
Show details
|
|
67 |
Self-supervised Learning with Random-projection Quantizer for Speech Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
69 |
CVSS Corpus and Massively Multilingual Speech-to-Speech Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
71 |
Improving the fusion of acoustic and text representations in RNN-T ...
|
|
|
|
BASE
|
|
Show details
|
|
72 |
Data and knowledge-driven approaches for multilingual training to improve the performance of speech recognition systems of Indian languages ...
|
|
|
|
BASE
|
|
Show details
|
|
73 |
Tackling data scarcity in speech translation using zero-shot multilingual machine translation techniques ...
|
|
|
|
BASE
|
|
Show details
|
|
75 |
Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset ...
|
|
|
|
BASE
|
|
Show details
|
|
76 |
WavThruVec: Latent speech representation as intermediate features for neural speech synthesis ...
|
|
|
|
BASE
|
|
Show details
|
|
77 |
Knowledge Transfer from Large-scale Pretrained Language Models to End-to-end Speech Recognizers ...
|
|
|
|
BASE
|
|
Show details
|
|
79 |
A Character-level Span-based Model for Mandarin Prosodic Structure Prediction ...
|
|
|
|
BASE
|
|
Show details
|
|
80 |
Fine-grained Noise Control for Multispeaker Speech Synthesis ...
|
|
|
|
BASE
|
|
Show details
|
|
Page: 1 2 3 4 5 6 7 8... 567
|
|