Page: 1 2 3 4 5 6 7 8... 567
62 |
Hierarchical Softmax for End-to-End Low-resource Multilingual Speech Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
63 |
Cross-lingual Self-Supervised Speech Representations for Improved Dysarthric Speech Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
65 |
Code-Switching Text Augmentation for Multilingual Speech Processing ...
|
|
|
|
BASE
|
|
Show details
|
|
67 |
Self-supervised Learning with Random-projection Quantizer for Speech Recognition ...
|
|
|
|
Abstract:
We present a simple and effective self-supervised learning approach for speech recognition. The approach learns a model to predict the masked speech signals, in the form of discrete labels generated with a random-projection quantizer. In particular the quantizer projects speech inputs with a randomly initialized matrix, and does a nearest-neighbor lookup in a randomly-initialized codebook. Neither the matrix nor the codebook is updated during self-supervised learning. Since the random-projection quantizer is not trained and is separated from the speech recognition model, the design makes the approach flexible and is compatible with universal speech recognition architecture. On LibriSpeech our approach achieves similar word-error-rates as previous work using self-supervised learning with non-streaming models, and provides lower word-error-rates and latency than wav2vec 2.0 and w2v-BERT with streaming models. On multilingual tasks the approach also provides significant improvement over wav2vec 2.0 and w2v-BERT. ...
|
|
Keyword:
Audio and Speech Processing eess.AS; Computation and Language cs.CL; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering; Sound cs.SD
|
|
URL: https://arxiv.org/abs/2202.01855 https://dx.doi.org/10.48550/arxiv.2202.01855
|
|
BASE
|
|
Hide details
|
|
69 |
CVSS Corpus and Massively Multilingual Speech-to-Speech Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
71 |
Improving the fusion of acoustic and text representations in RNN-T ...
|
|
|
|
BASE
|
|
Show details
|
|
72 |
Data and knowledge-driven approaches for multilingual training to improve the performance of speech recognition systems of Indian languages ...
|
|
|
|
BASE
|
|
Show details
|
|
73 |
Tackling data scarcity in speech translation using zero-shot multilingual machine translation techniques ...
|
|
|
|
BASE
|
|
Show details
|
|
75 |
Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset ...
|
|
|
|
BASE
|
|
Show details
|
|
76 |
WavThruVec: Latent speech representation as intermediate features for neural speech synthesis ...
|
|
|
|
BASE
|
|
Show details
|
|
77 |
Knowledge Transfer from Large-scale Pretrained Language Models to End-to-end Speech Recognizers ...
|
|
|
|
BASE
|
|
Show details
|
|
79 |
A Character-level Span-based Model for Mandarin Prosodic Structure Prediction ...
|
|
|
|
BASE
|
|
Show details
|
|
80 |
Fine-grained Noise Control for Multispeaker Speech Synthesis ...
|
|
|
|
BASE
|
|
Show details
|
|
Page: 1 2 3 4 5 6 7 8... 567
|
|