1 |
Subspace-based Representation and Learning for Phonotactic Spoken Language Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
WavThruVec: Latent speech representation as intermediate features for neural speech synthesis ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Fine-grained Noise Control for Multispeaker Speech Synthesis ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Emotion Intensity and its Control for Emotional Voice Conversion ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Low-dimensional representation of infant and adult vocalization acoustics ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Chain-based Discriminative Autoencoders for Speech Recognition ...
|
|
|
|
Abstract:
In our previous work, we proposed a discriminative autoencoder (DcAE) for speech recognition. DcAE combines two training schemes into one. First, since DcAE aims to learn encoder-decoder mappings, the squared error between the reconstructed speech and the input speech is minimized. Second, in the code layer, frame-based phonetic embeddings are obtained by minimizing the categorical cross-entropy between ground truth labels and predicted triphone-state scores. DcAE is developed based on the Kaldi toolkit by treating various TDNN models as encoders. In this paper, we further propose three new versions of DcAE. First, a new objective function that considers both categorical cross-entropy and mutual information between ground truth and predicted triphone-state sequences is used. The resulting DcAE is called a chain-based DcAE (c-DcAE). For application to robust speech recognition, we further extend c-DcAE to hierarchical and parallel structures, resulting in hc-DcAE and pc-DcAE. In these two models, both the ... : Submitted to Interspeech 2022 ...
|
|
Keyword:
Artificial Intelligence cs.AI; Audio and Speech Processing eess.AS; Computation and Language cs.CL; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering; Machine Learning cs.LG; Multimedia cs.MM; Sound cs.SD
|
|
URL: https://dx.doi.org/10.48550/arxiv.2203.13687 https://arxiv.org/abs/2203.13687
|
|
BASE
|
|
Hide details
|
|
8 |
Filter-based Discriminative Autoencoders for Children Speech Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Self-Supervised Representation Learning for Speech Using Visual Grounding and Masked Language Modeling ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Continual Learning for Monolingual End-to-End Automatic Speech Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Cetacean Translation Initiative: a roadmap to deciphering the communication of sperm whales ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
Speech Representations and Phoneme Classification for Preserving the Endangered Language of Ladin ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Applying Phonological Features in Multilingual Text-To-Speech ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
English Accent Accuracy Analysis in a State-of-the-Art Automatic Speech Recognition System ...
|
|
|
|
BASE
|
|
Show details
|
|
15 |
Cross-lingual Low Resource Speaker Adaptation Using Phonological Features ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Arabic Speech Recognition by End-to-End, Modular Systems and Human ...
|
|
|
|
BASE
|
|
Show details
|
|
17 |
The INTERSPEECH 2021 Computational Paralinguistics Challenge: COVID-19 Cough, COVID-19 Speech, Escalation & Primates ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Discrete representations in neural models of spoken language ...
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Phrase break prediction with bidirectional encoder representations in Japanese text-to-speech synthesis ...
|
|
|
|
BASE
|
|
Show details
|
|
20 |
Learning De-identified Representations of Prosody from Raw Audio ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|