Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Page: 1 2

Hits 1 – 20 of 21

1	Measuring the Impact of Individual Domain Factors in Self-Supervised Pre-Training ...
	Sanabria, Ramon; Hsu, Wei-Ning; Baevski, Alexei; Auli, Michael. - : arXiv, 2022
	Abstract: Human speech data comprises a rich set of domain factors such as accent, syntactic and semantic variety, or acoustic environment. Previous work explores the effect of domain mismatch in automatic speech recognition between pre-training and fine-tuning as a whole but does not dissect the contribution of individual factors. In this paper, we present a controlled study to better understand the effect of such factors on the performance of pre-trained representations. To do so, we pre-train models either on modified natural speech or synthesized audio, with a single domain factor modified, and then measure performance on automatic speech recognition after fine tuning. Results show that phonetic domain factors play an important role during pre-training while grammatical and syntactic factors are far less important. To our knowledge, this is the first study to better understand the domain characteristics in self-supervised pre-training for speech. ... : Submitted to Insterspeech 2022 ...
	Keyword: Audio and Speech Processing eess.AS; Computation and Language cs.CL; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering; Sound cs.SD
	URL: https://arxiv.org/abs/2203.00648 https://dx.doi.org/10.48550/arxiv.2203.00648
	BASE
	Hide details

2	Simple and Effective Unsupervised Speech Synthesis ...
	Liu, Alexander H.; Lai, Cheng-I Jeff; Hsu, Wei-Ning. - : arXiv, 2022
	BASE
	Show details

3	Speech Resynthesis from Discrete Disentangled Self-Supervised Representations
	Polyak, Adam; Adi, Yossi; Copet, Jade...
	In: INTERSPEECH 2021 - Annual Conference of the International Speech Communication Association ; https://hal.inria.fr/hal-03329245 ; INTERSPEECH 2021 - Annual Conference of the International Speech Communication Association, Aug 2021, Brno, Czech Republic (2021)
	BASE
	Show details

4	On Generative Spoken Language Modeling from Raw Audio
	Lakhotia, Kushal; Kharitonov, Evgeny; Hsu, Wei-Ning...
	In: EISSN: 2307-387X ; Transactions of the Association for Computational Linguistics ; https://hal.inria.fr/hal-03329219 ; Transactions of the Association for Computational Linguistics, The MIT Press, 2021 (2021)
	BASE
	Show details

5	Generative Spoken Language Modeling from Raw Audio ...
	Lakhotia, Kushal; Kharitonov, Evgeny; Hsu, Wei-Ning. - : arXiv, 2021
	BASE
	Show details

6	Text-Free Image-to-Speech Synthesis Using Learned Segmental Units ...
	The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing 2021; Glass, James; Harwath, David. - : Underline Science Inc., 2021
	BASE
	Show details

7	Generative Spoken Language Modeling from Raw Audio ...
	The 2021 Conference on Empirical Methods in Natural Language Processing 2021; Adi, Yossi; Baevski, Alexei. - : Underline Science Inc., 2021
	BASE
	Show details

8	Textless Speech-to-Speech Translation on Real Data ...
	Lee, Ann; Gong, Hongyu; Duquenne, Paul-Ambroise. - : arXiv, 2021
	BASE
	Show details

9	HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units ...
	Hsu, Wei-Ning; Bolte, Benjamin; Tsai, Yao-Hung Hubert. - : arXiv, 2021
	BASE
	Show details

10	Textless Speech Emotion Conversion using Discrete and Decomposed Representations ...
	Kreuk, Felix; Polyak, Adam; Copet, Jade. - : arXiv, 2021
	BASE
	Show details

11	A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning
	Khurana, Sameer; Laurent, Antoine; Hsu, Wei-Ning...
	In: Interspeech 2020 ; https://hal.archives-ouvertes.fr/hal-02912029 ; Interspeech 2020, Oct 2020, Shanghai, China (2020)
	BASE
	Show details

12	Speech processing with less supervision : learning from weak labels and multiple modalities
	Hsu, Wei-Ning,Ph. D.Massachusetts Institute of Technology.. - : Massachusetts Institute of Technology, 2020
	BASE
	Show details

13	A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning ...
	Khurana, Sameer; Laurent, Antoine; Hsu, Wei-Ning. - : arXiv, 2020
	BASE
	Show details

14	Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech ...
	Harwath, David; Hsu, Wei-Ning; Glass, James. - : arXiv, 2019
	BASE
	Show details

15	Transfer Learning from Audio-Visual Grounding to Speech Recognition ...
	Hsu, Wei-Ning; Harwath, David; Glass, James. - : arXiv, 2019
	BASE
	Show details

16	Unsupervised learning of disentangled representations for speech with neural variational inference models
	Hsu, Wei-Ning, Ph. D. Massachusetts Institute of Technology. - : Massachusetts Institute of Technology, 2018
	BASE
	Show details

17	Unsupervised Adaptation with Interpretable Disentangled Representations for Distant Conversational Speech Recognition ...
	Hsu, Wei-Ning; Tang, Hao; Glass, James. - : arXiv, 2018
	BASE
	Show details

18	Unsupervised Representation Learning of Speech for Dialect Identification ...
	Shon, Suwon; Hsu, Wei-Ning; Glass, James. - : arXiv, 2018
	BASE
	Show details

19	Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data ...
	Hsu, Wei-Ning; Zhang, Yu; Glass, James. - : arXiv, 2017
	BASE
	Show details

20	Learning Latent Representations for Speech Generation and Transformation ...
	Hsu, Wei-Ning; Zhang, Yu; Glass, James. - : arXiv, 2017
	BASE
	Show details

Page: 1 2

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern