21 |
Repeat after me: Self-supervised learning of acoustic-to-articulatory mapping by vocal imitation ...
|
|
|
|
BASE
|
|
Show details
|
|
22 |
Can Social Robots Effectively Elicit Curiosity in STEM Topics from K-1 Students During Oral Assessments? ...
|
|
|
|
BASE
|
|
Show details
|
|
23 |
An error correction scheme for improved air-tissue boundary in real-time MRI video for speech production ...
|
|
|
|
BASE
|
|
Show details
|
|
24 |
Expression-preserving face frontalization improves visually assisted speech processing ...
|
|
|
|
BASE
|
|
Show details
|
|
25 |
Synthesizing Dysarthric Speech Using Multi-talker TTS for Dysarthric Speech Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
26 |
Multi Antenna Radar System for American Sign Language (ASL) Recognition Using Deep Learning ...
|
|
|
|
BASE
|
|
Show details
|
|
27 |
Effect of Kinematics and Fluency in Adversarial Synthetic Data Generation for ASL Recognition with RF Sensors ...
|
|
|
|
BASE
|
|
Show details
|
|
30 |
Hierarchical Softmax for End-to-End Low-resource Multilingual Speech Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
31 |
Cross-Lingual Text-to-Speech Using Multi-Task Learning and Speaker Classifier Joint Training ...
|
|
|
|
BASE
|
|
Show details
|
|
32 |
Cross-lingual Self-Supervised Speech Representations for Improved Dysarthric Speech Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
33 |
VoxSRC 2021: The Third VoxCeleb Speaker Recognition Challenge ...
|
|
|
|
BASE
|
|
Show details
|
|
34 |
Language Adaptive Cross-lingual Speech Representation Learning with Sparse Sharing Sub-networks ...
|
|
|
|
Abstract:
Unsupervised cross-lingual speech representation learning (XLSR) has recently shown promising results in speech recognition by leveraging vast amounts of unlabeled data across multiple languages. However, standard XLSR model suffers from language interference problem due to the lack of language specific modeling ability. In this work, we investigate language adaptive training on XLSR models. More importantly, we propose a novel language adaptive pre-training approach based on sparse sharing sub-networks. It makes room for language specific modeling by pruning out unimportant parameters for each language, without requiring any manually designed language specific component. After pruning, each language only maintains a sparse sub-network, while the sub-networks are partially shared with each other. Experimental results on a downstream multilingual speech recognition task show that our proposed method significantly outperforms baseline XLSR models on both high resource and low resource languages. Besides, our ... : To appear in ICASSP 2022 ...
|
|
Keyword:
Audio and Speech Processing eess.AS; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering; Sound cs.SD
|
|
URL: https://arxiv.org/abs/2203.04583 https://dx.doi.org/10.48550/arxiv.2203.04583
|
|
BASE
|
|
Hide details
|
|
36 |
Code-Switching Text Augmentation for Multilingual Speech Processing ...
|
|
|
|
BASE
|
|
Show details
|
|
39 |
Self-supervised Learning with Random-projection Quantizer for Speech Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|