DE eng

Search in the Catalogues and Directories

Page: 1 2
Hits 1 – 20 of 21

1
Measuring the Impact of Individual Domain Factors in Self-Supervised Pre-Training ...
BASE
Show details
2
Simple and Effective Unsupervised Speech Synthesis ...
BASE
Show details
3
Speech Resynthesis from Discrete Disentangled Self-Supervised Representations
In: INTERSPEECH 2021 - Annual Conference of the International Speech Communication Association ; https://hal.inria.fr/hal-03329245 ; INTERSPEECH 2021 - Annual Conference of the International Speech Communication Association, Aug 2021, Brno, Czech Republic (2021)
BASE
Show details
4
On Generative Spoken Language Modeling from Raw Audio
In: EISSN: 2307-387X ; Transactions of the Association for Computational Linguistics ; https://hal.inria.fr/hal-03329219 ; Transactions of the Association for Computational Linguistics, The MIT Press, 2021 (2021)
BASE
Show details
5
Generative Spoken Language Modeling from Raw Audio ...
BASE
Show details
6
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units ...
Abstract: Read paper: https://www.aclanthology.org/2021.acl-long.411 Abstract: In this paper we present the first model for directly synthesizing fluent, natural-sounding spoken audio captions for images that does not require natural language text as an intermediate representation or source of supervision. Instead, we connect the image captioning module and the speech synthesis module with a set of discrete, sub-word speech units that are discovered with a self-supervised visual grounding task. We conduct experiments on the Flickr8k spoken caption dataset in addition to a novel corpus of spoken audio captions collected for the popular MSCOCO dataset, demonstrating that our generated captions also capture diverse visual semantics of the images they describe. We investigate several different intermediate speech representations, and empirically find that the representation must satisfy several important properties to serve as drop-in replacements for text. ...
Keyword: Computational Linguistics; Condensed Matter Physics; Deep Learning; Electromagnetism; FOS Physical sciences; Information and Knowledge Engineering; Neural Network; Semantics
URL: https://underline.io/lecture/25832-text-free-image-to-speech-synthesis-using-learned-segmental-units
https://dx.doi.org/10.48448/r06d-y818
BASE
Hide details
7
Generative Spoken Language Modeling from Raw Audio ...
BASE
Show details
8
Textless Speech-to-Speech Translation on Real Data ...
BASE
Show details
9
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units ...
BASE
Show details
10
Textless Speech Emotion Conversion using Discrete and Decomposed Representations ...
BASE
Show details
11
A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning
In: Interspeech 2020 ; https://hal.archives-ouvertes.fr/hal-02912029 ; Interspeech 2020, Oct 2020, Shanghai, China (2020)
BASE
Show details
12
Speech processing with less supervision : learning from weak labels and multiple modalities
Hsu, Wei-Ning,Ph. D.Massachusetts Institute of Technology.. - : Massachusetts Institute of Technology, 2020
BASE
Show details
13
A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning ...
BASE
Show details
14
Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech ...
BASE
Show details
15
Transfer Learning from Audio-Visual Grounding to Speech Recognition ...
BASE
Show details
16
Unsupervised learning of disentangled representations for speech with neural variational inference models
Hsu, Wei-Ning, Ph. D. Massachusetts Institute of Technology. - : Massachusetts Institute of Technology, 2018
BASE
Show details
17
Unsupervised Adaptation with Interpretable Disentangled Representations for Distant Conversational Speech Recognition ...
Hsu, Wei-Ning; Tang, Hao; Glass, James. - : arXiv, 2018
BASE
Show details
18
Unsupervised Representation Learning of Speech for Dialect Identification ...
BASE
Show details
19
Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data ...
Hsu, Wei-Ning; Zhang, Yu; Glass, James. - : arXiv, 2017
BASE
Show details
20
Learning Latent Representations for Speech Generation and Transformation ...
Hsu, Wei-Ning; Zhang, Yu; Glass, James. - : arXiv, 2017
BASE
Show details

Page: 1 2

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
21
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern