DE eng

Search in the Catalogues and Directories

Hits 1 – 11 of 11

1
Simple and Effective Unsupervised Speech Synthesis ...
BASE
Show details
2
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units ...
BASE
Show details
3
A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning
In: Interspeech 2020 ; https://hal.archives-ouvertes.fr/hal-02912029 ; Interspeech 2020, Oct 2020, Shanghai, China (2020)
BASE
Show details
4
A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning ...
Abstract: Probabilistic Latent Variable Models (LVMs) provide an alternative to self-supervised learning approaches for linguistic representation learning from speech. LVMs admit an intuitive probabilistic interpretation where the latent structure shapes the information extracted from the signal. Even though LVMs have recently seen a renewed interest due to the introduction of Variational Autoencoders (VAEs), their use for speech representation learning remains largely unexplored. In this work, we propose Convolutional Deep Markov Model (ConvDMM), a Gaussian state-space model with non-linear emission and transition functions modelled by deep neural networks. This unsupervised model is trained using black box variational inference. A deep convolutional neural network is used as an inference network for structured variational approximation. When trained on a large scale speech dataset (LibriSpeech), ConvDMM produces features that significantly outperform multiple self-supervised feature extracting methods on linear ... : Proceedings of Interspeech, 2020 ...
Keyword: Audio and Speech Processing eess.AS; Computation and Language cs.CL; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering; Machine Learning cs.LG; Sound cs.SD
URL: https://arxiv.org/abs/2006.02547
https://dx.doi.org/10.48550/arxiv.2006.02547
BASE
Hide details
5
Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech ...
BASE
Show details
6
Transfer Learning from Audio-Visual Grounding to Speech Recognition ...
BASE
Show details
7
Unsupervised Adaptation with Interpretable Disentangled Representations for Distant Conversational Speech Recognition ...
Hsu, Wei-Ning; Tang, Hao; Glass, James. - : arXiv, 2018
BASE
Show details
8
Unsupervised Representation Learning of Speech for Dialect Identification ...
BASE
Show details
9
Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data ...
Hsu, Wei-Ning; Zhang, Yu; Glass, James. - : arXiv, 2017
BASE
Show details
10
Learning Latent Representations for Speech Generation and Transformation ...
Hsu, Wei-Ning; Zhang, Yu; Glass, James. - : arXiv, 2017
BASE
Show details
11
Recurrent Neural Network Encoder with Attention for Community Question Answering ...
Hsu, Wei-Ning; Zhang, Yu; Glass, James. - : arXiv, 2016
BASE
Show details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
11
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern