Page: 1 2 3 4 5 6 7 8 9... 567
81 |
Emotion Intensity and its Control for Emotional Voice Conversion ...
|
|
|
|
BASE
|
|
Show details
|
|
82 |
Automatic Speech recognition for Speech Assessment of Preschool Children ...
|
|
|
|
BASE
|
|
Show details
|
|
83 |
Deep Speech Based End-to-End Automated Speech Recognition (ASR) for Indian-English Accents ...
|
|
|
|
BASE
|
|
Show details
|
|
84 |
KazakhTTS2: Extending the Open-Source Kazakh TTS Corpus With More Data, Speakers, and Topics ...
|
|
|
|
BASE
|
|
Show details
|
|
85 |
Automated speech tools for helping communities process restricted-access corpora for language revival efforts ...
|
|
|
|
BASE
|
|
Show details
|
|
87 |
Separate What You Describe: Language-Queried Audio Source Separation ...
|
|
|
|
BASE
|
|
Show details
|
|
88 |
A Complementary Joint Training Approach Using Unpaired Speech and Text for Low-Resource Automatic Speech Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
90 |
The First Gospel, the Gospel of the Poor: A New Reconstruction of Q and Resolution of the Synoptic Problem based on Marcion's Early Luke ...
|
|
|
|
BASE
|
|
Show details
|
|
97 |
Measuring the Impact of Individual Domain Factors in Self-Supervised Pre-Training ...
|
|
|
|
Abstract:
Human speech data comprises a rich set of domain factors such as accent, syntactic and semantic variety, or acoustic environment. Previous work explores the effect of domain mismatch in automatic speech recognition between pre-training and fine-tuning as a whole but does not dissect the contribution of individual factors. In this paper, we present a controlled study to better understand the effect of such factors on the performance of pre-trained representations. To do so, we pre-train models either on modified natural speech or synthesized audio, with a single domain factor modified, and then measure performance on automatic speech recognition after fine tuning. Results show that phonetic domain factors play an important role during pre-training while grammatical and syntactic factors are far less important. To our knowledge, this is the first study to better understand the domain characteristics in self-supervised pre-training for speech. ... : Submitted to Insterspeech 2022 ...
|
|
Keyword:
Audio and Speech Processing eess.AS; Computation and Language cs.CL; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering; Sound cs.SD
|
|
URL: https://arxiv.org/abs/2203.00648 https://dx.doi.org/10.48550/arxiv.2203.00648
|
|
BASE
|
|
Hide details
|
|
98 |
Low-dimensional representation of infant and adult vocalization acoustics ...
|
|
|
|
BASE
|
|
Show details
|
|
99 |
Dual-Decoder Transformer For end-to-end Mandarin Chinese Speech Recognition with Pinyin and Character ...
|
|
|
|
BASE
|
|
Show details
|
|
100 |
Similarity and Content-based Phonetic Self Attention for Speech Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
Page: 1 2 3 4 5 6 7 8 9... 567
|
|