DE eng

Search in the Catalogues and Directories

Page: 1 2
Hits 1 – 20 of 21

1
Measuring the Impact of Individual Domain Factors in Self-Supervised Pre-Training ...
Abstract: Human speech data comprises a rich set of domain factors such as accent, syntactic and semantic variety, or acoustic environment. Previous work explores the effect of domain mismatch in automatic speech recognition between pre-training and fine-tuning as a whole but does not dissect the contribution of individual factors. In this paper, we present a controlled study to better understand the effect of such factors on the performance of pre-trained representations. To do so, we pre-train models either on modified natural speech or synthesized audio, with a single domain factor modified, and then measure performance on automatic speech recognition after fine tuning. Results show that phonetic domain factors play an important role during pre-training while grammatical and syntactic factors are far less important. To our knowledge, this is the first study to better understand the domain characteristics in self-supervised pre-training for speech. ... : Submitted to Insterspeech 2022 ...
Keyword: Audio and Speech Processing eess.AS; Computation and Language cs.CL; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering; Sound cs.SD
URL: https://arxiv.org/abs/2203.00648
https://dx.doi.org/10.48550/arxiv.2203.00648
BASE
Hide details
2
Simple and Effective Unsupervised Speech Synthesis ...
BASE
Show details
3
Speech Resynthesis from Discrete Disentangled Self-Supervised Representations
In: INTERSPEECH 2021 - Annual Conference of the International Speech Communication Association ; https://hal.inria.fr/hal-03329245 ; INTERSPEECH 2021 - Annual Conference of the International Speech Communication Association, Aug 2021, Brno, Czech Republic (2021)
BASE
Show details
4
On Generative Spoken Language Modeling from Raw Audio
In: EISSN: 2307-387X ; Transactions of the Association for Computational Linguistics ; https://hal.inria.fr/hal-03329219 ; Transactions of the Association for Computational Linguistics, The MIT Press, 2021 (2021)
BASE
Show details
5
Generative Spoken Language Modeling from Raw Audio ...
BASE
Show details
6
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units ...
BASE
Show details
7
Generative Spoken Language Modeling from Raw Audio ...
BASE
Show details
8
Textless Speech-to-Speech Translation on Real Data ...
BASE
Show details
9
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units ...
BASE
Show details
10
Textless Speech Emotion Conversion using Discrete and Decomposed Representations ...
BASE
Show details
11
A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning
In: Interspeech 2020 ; https://hal.archives-ouvertes.fr/hal-02912029 ; Interspeech 2020, Oct 2020, Shanghai, China (2020)
BASE
Show details
12
Speech processing with less supervision : learning from weak labels and multiple modalities
Hsu, Wei-Ning,Ph. D.Massachusetts Institute of Technology.. - : Massachusetts Institute of Technology, 2020
BASE
Show details
13
A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning ...
BASE
Show details
14
Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech ...
BASE
Show details
15
Transfer Learning from Audio-Visual Grounding to Speech Recognition ...
BASE
Show details
16
Unsupervised learning of disentangled representations for speech with neural variational inference models
Hsu, Wei-Ning, Ph. D. Massachusetts Institute of Technology. - : Massachusetts Institute of Technology, 2018
BASE
Show details
17
Unsupervised Adaptation with Interpretable Disentangled Representations for Distant Conversational Speech Recognition ...
Hsu, Wei-Ning; Tang, Hao; Glass, James. - : arXiv, 2018
BASE
Show details
18
Unsupervised Representation Learning of Speech for Dialect Identification ...
BASE
Show details
19
Unsupervised Learning of Disentangled and Interpretable Representations from Sequential Data ...
Hsu, Wei-Ning; Zhang, Yu; Glass, James. - : arXiv, 2017
BASE
Show details
20
Learning Latent Representations for Speech Generation and Transformation ...
Hsu, Wei-Ning; Zhang, Yu; Glass, James. - : arXiv, 2017
BASE
Show details

Page: 1 2

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
21
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern