1 |
Cross-Lingual Text-to-Speech Using Multi-Task Learning and Speaker Classifier Joint Training ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Improving the fusion of acoustic and text representations in RNN-T ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Automatic Depression Detection: An Emotional Audio-Textual Corpus and a GRU/BiLSTM-based Model ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Separate What You Describe: Language-Queried Audio Source Separation ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Chain-based Discriminative Autoencoders for Speech Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Unsupervised word-level prosody tagging for controllable speech synthesis ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Cetacean Translation Initiative: a roadmap to deciphering the communication of sperm whales ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Improving End-To-End Modeling for Mispronunciation Detection with Effective Augmentation Mechanisms ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
An Improved StarGAN for Emotional Voice Conversion: Enhancing Voice Quality and Data Augmentation ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
Speech2Slot: An End-to-End Knowledge-based Slot Filling from Speech ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
NIST SRE CTS Superset: A large-scale dataset for telephony speaker recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Interpreting intermediate convolutional layers of CNNs trained on raw speech ...
|
|
|
|
BASE
|
|
Show details
|
|
15 |
LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
SERAB: A multi-lingual benchmark for speech emotion recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
17 |
Detecting Emotion Carriers by Combining Acoustic and Lexical Representations ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Textless Speech Emotion Conversion using Discrete and Decomposed Representations ...
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Can phones, syllables, and words emerge as side-products of cross-situational audiovisual learning? -- A computational investigation ...
|
|
|
|
BASE
|
|
Show details
|
|
20 |
Converting Anyone's Emotion: Towards Speaker-Independent Emotional Voice Conversion ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|