Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Page: 1 2

Hits 1 – 20 of 21

1	Measuring the Impact of Individual Domain Factors in Self-Supervised Pre-Training ...
	Sanabria, Ramon; Hsu, Wei-Ning; Baevski, Alexei. - : arXiv, 2022
	BASE
	Show details

2	Simple and Effective Unsupervised Speech Synthesis ...
	Liu, Alexander H.; Lai, Cheng-I Jeff; Hsu, Wei-Ning. - : arXiv, 2022
	BASE
	Show details

3	Speech Resynthesis from Discrete Disentangled Self-Supervised Representations
	Polyak, Adam; Adi, Yossi; Copet, Jade...
	In: INTERSPEECH 2021 - Annual Conference of the International Speech Communication Association ; https://hal.inria.fr/hal-03329245 ; INTERSPEECH 2021 - Annual Conference of the International Speech Communication Association, Aug 2021, Brno, Czech Republic (2021)
	BASE
	Show details

4	On Generative Spoken Language Modeling from Raw Audio
	Lakhotia, Kushal; Kharitonov, Evgeny; Hsu, Wei-Ning...
	In: EISSN: 2307-387X ; Transactions of the Association for Computational Linguistics ; https://hal.inria.fr/hal-03329219 ; Transactions of the Association for Computational Linguistics, The MIT Press, 2021 (2021)
	BASE
	Show details

5	Generative Spoken Language Modeling from Raw Audio ...
	Lakhotia, Kushal; Kharitonov, Evgeny; Hsu, Wei-Ning. - : arXiv, 2021
	BASE
	Show details

6	Text-Free Image-to-Speech Synthesis Using Learned Segmental Units ...
	The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing 2021; Glass, James; Harwath, David; Hsu, Wei-Ning; Miller, Tyler; Song, Christopher. - : Underline Science Inc., 2021
	Abstract: Read paper: https://www.aclanthology.org/2021.acl-long.411 Abstract: In this paper we present the first model for directly synthesizing fluent, natural-sounding spoken audio captions for images that does not require natural language text as an intermediate representation or source of supervision. Instead, we connect the image captioning module and the speech synthesis module with a set of discrete, sub-word speech units that are discovered with a self-supervised visual grounding task. We conduct experiments on the Flickr8k spoken caption dataset in addition to a novel corpus of spoken audio captions collected for the popular MSCOCO dataset, demonstrating that our generated captions also capture diverse visual semantics of the images they describe. We investigate several different intermediate speech representations, and empirically find that the representation must satisfy several important properties to serve as drop-in replacements for text. ...
	Keyword: Computational Linguistics; Condensed Matter Physics; Deep Learning; Electromagnetism; FOS Physical sciences; Information and Knowledge Engineering; Neural Network; Semantics
	URL: https://underline.io/lecture/25832-text-free-image-to-speech-synthesis-using-learned-segmental-units https://dx.doi.org/10.48448/r06d-y818
	BASE
	Hide details

7	Generative Spoken Language Modeling from Raw Audio ...
	The 2021 Conference on Empirical Methods in Natural Language Processing 2021; Adi, Yossi; Baevski, Alexei. - : Underline Science Inc., 2021
	BASE
	Show details

8	Textless Speech-to-Speech Translation on Real Data ...
	Lee, Ann; Gong, Hongyu; Duquenne, Paul-Ambroise. - : arXiv, 2021
	BASE
	Show details

9	HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units ...
	Hsu, Wei-Ning; Bolte, Benjamin; Tsai, Yao-Hung Hubert. - : arXiv, 2021
	BASE
	Show details

10	Textless Speech Emotion Conversion using Discrete and Decomposed Representations ...
	Kreuk, Felix; Polyak, Adam; Copet, Jade. - : arXiv, 2021
	BASE
	Show details

11	A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning
	Khurana, Sameer; Laurent, Antoine; Hsu, Wei-Ning...
	In: Interspeech 2020 ; https://hal.archives-ouvertes.fr/hal-02912029 ; Interspeech 2020, Oct 2020, Shanghai, China (2020)
	BASE
	Show details

12	Speech processing with less supervision : learning from weak labels and multiple modalities
	Hsu, Wei-Ning,Ph. D.Massachusetts Institute of Technology.. - : Massachusetts Institute of Technology, 2020
	BASE
	Show details

13	A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning ...
	Khurana, Sameer; Laurent, Antoine; Hsu, Wei-Ning. - : arXiv, 2020
	BASE
	Show details

14	Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech ...
	Harwath, David; Hsu, Wei-Ning; Glass, James. - : arXiv, 2019
	BASE
	Show details

15	Transfer Learning from Audio-Visual Grounding to Speech Recognition ...
	Hsu, Wei-Ning; Harwath, David; Glass, James. - : arXiv, 2019
	BASE
	Show details

16	Unsupervised learning of disentangled representations for speech with neural variational inference models
	Hsu, Wei-Ning, Ph. D. Massachusetts Institute of Technology. - : Massachusetts Institute of Technology, 2018
	BASE
	Show details

17	Unsupervised Adaptation with Interpretable Disentangled Representations for Distant Conversational Speech Recognition ...
	Hsu, Wei-Ning; Tang, Hao; Glass, James. - : arXiv, 2018
	BASE
	Show details

18	Unsupervised Representation Learning of Speech for Dialect Identification ...
	Shon, Suwon; Hsu, Wei-Ning; Glass, James. - : arXiv, 2018
	BASE
	Show details

19	Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data ...
	Hsu, Wei-Ning; Zhang, Yu; Glass, James. - : arXiv, 2017
	BASE
	Show details

20	Learning Latent Representations for Speech Generation and Transformation ...
	Hsu, Wei-Ning; Zhang, Yu; Glass, James. - : arXiv, 2017
	BASE
	Show details

Page: 1 2

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern