Page: 1 2 3 4 5 6 7 8... 100
61 |
Filter-based Discriminative Autoencoders for Children Speech Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
62 |
Transducer-based language embedding for spoken language identification ...
|
|
|
|
BASE
|
|
Show details
|
|
63 |
Detecting Dysfluencies in Stuttering Therapy Using wav2vec 2.0 ...
|
|
|
|
BASE
|
|
Show details
|
|
64 |
Multi-sequence Intermediate Conditioning for CTC-based ASR ...
|
|
|
|
BASE
|
|
Show details
|
|
65 |
Code Switched and Code Mixed Speech Recognition for Indic languages ...
|
|
|
|
BASE
|
|
Show details
|
|
67 |
Multistream neural architectures for cued-speech recognition using a pre-trained visual feature extractor and constrained CTC decoding ...
|
|
|
|
BASE
|
|
Show details
|
|
68 |
Applying Feature Underspecified Lexicon Phonological Features in Multilingual Text-to-Speech ...
|
|
|
|
BASE
|
|
Show details
|
|
69 |
MAESTRO: Matched Speech Text Representations through Modality Matching ...
|
|
|
|
Abstract:
We present Maestro, a self-supervised training method to unify representations learnt from speech and text modalities. Self-supervised learning from speech signals aims to learn the latent structure inherent in the signal, while self-supervised learning from text attempts to capture lexical information. Learning aligned representations from unpaired speech and text sequences is a challenging task. Previous work either implicitly enforced the representations learnt from these two modalities to be aligned in the latent space through multitasking and parameter sharing or explicitly through conversion of modalities via speech synthesis. While the former suffers from interference between the two modalities, the latter introduces additional complexity. In this paper, we propose Maestro, a novel algorithm to learn unified representations from both these modalities simultaneously that can transfer to diverse downstream tasks such as Automated Speech Recognition (ASR) and Speech Translation (ST). Maestro learns ... : Submitted to Interspeech 2022 ...
|
|
Keyword:
68T10; Audio and Speech Processing eess.AS; Computation and Language cs.CL; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering; I.2.7; Sound cs.SD
|
|
URL: https://arxiv.org/abs/2204.03409 https://dx.doi.org/10.48550/arxiv.2204.03409
|
|
BASE
|
|
Hide details
|
|
72 |
Effect and Analysis of Large-scale Language Model Rescoring on Competitive ASR Systems ...
|
|
|
|
BASE
|
|
Show details
|
|
75 |
Lombard Effect for Bilingual Speakers in Cantonese and English: importance of spectro-temporal features ...
|
|
|
|
BASE
|
|
Show details
|
|
76 |
Cochlear Implant Results in Older Adults with Post-Lingual Deafness: The Role of “Top-Down” Neurocognitive Mechanisms
|
|
|
|
In: International Journal of Environmental Research and Public Health; Volume 19; Issue 3; Pages: 1343 (2022)
|
|
BASE
|
|
Show details
|
|
77 |
MLLP-VRAIN Spanish ASR Systems for the Albayzín-RTVE 2020 Speech-to-Text Challenge: Extension
|
|
|
|
In: Applied Sciences; Volume 12; Issue 2; Pages: 804 (2022)
|
|
BASE
|
|
Show details
|
|
78 |
On the Difference of Scoring in Speech in Babble Tests
|
|
|
|
In: Healthcare; Volume 10; Issue 3; Pages: 458 (2022)
|
|
BASE
|
|
Show details
|
|
79 |
An Empirical Performance Analysis of the Speak Correct Computerized Interface
|
|
|
|
In: Processes; Volume 10; Issue 3; Pages: 487 (2022)
|
|
BASE
|
|
Show details
|
|
80 |
DeepFry: Identifying Vocal Fry Using Deep Neural Networks ...
|
|
|
|
BASE
|
|
Show details
|
|
Page: 1 2 3 4 5 6 7 8... 100
|
|