DE eng

Search in the Catalogues and Directories

Hits 1 – 19 of 19

1
Unsupervised word-level prosody tagging for controllable speech synthesis ...
Guo, Yiwei; Du, Chenpeng; Yu, Kai. - : arXiv, 2022
Abstract: Although word-level prosody modeling in neural text-to-speech (TTS) has been investigated in recent research for diverse speech synthesis, it is still challenging to control speech synthesis manually without a specific reference. This is largely due to lack of word-level prosody tags. In this work, we propose a novel approach for unsupervised word-level prosody tagging with two stages, where we first group the words into different types with a decision tree according to their phonetic content and then cluster the prosodies using GMM within each type of words separately. This design is based on the assumption that the prosodies of different type of words, such as long or short words, should be tagged with different label sets. Furthermore, a TTS system with the derived word-level prosody tags is trained for controllable speech synthesis. Experiments on LJSpeech show that the TTS model trained with word-level prosody tags not only achieves better naturalness than a typical FastSpeech2 model, but also gains the ... : 5 pages, 6 figures, accepted to ICASSP2022 ...
Keyword: Artificial Intelligence cs.AI; Audio and Speech Processing eess.AS; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering; Machine Learning cs.LG; Sound cs.SD
URL: https://dx.doi.org/10.48550/arxiv.2202.07200
https://arxiv.org/abs/2202.07200
BASE
Hide details
2
LET: Linguistic Knowledge Enhanced Graph Transformer for Chinese Short Text Matching ...
Lyu, Boer; Chen, Lu; Zhu, Su. - : arXiv, 2021
BASE
Show details
3
Glyph Enhanced Chinese Character Pre-Training for Lexical Sememe Prediction ...
BASE
Show details
4
Bridging the Communication Gap between Radiographers and Patients to Improve Chest Radiography Image Acquisition: A Multilingual Solution in the COVID-19 Pandemic
In: Radiography (Lond) (2021)
BASE
Show details
5
Towards Universal Dialogue State Tracking ...
Ren, Liliang; Xie, Kaige; Chen, Lu. - : arXiv, 2018
BASE
Show details
6
Concept Transfer Learning for Adaptive Language Understanding ...
Zhu, Su; Yu, Kai. - : arXiv, 2017
BASE
Show details
7
Differences in Oral Structure and Tissue Interactions during Mouse vs. Human Palatogenesis: Implications for the Translation of Findings from Mice
Yu, Kai; Deng, Mei; Naluai-Cecchini, Theresa. - : Frontiers Media S.A., 2017
BASE
Show details
8
Text Flow: A Unified Text Detection System in Natural Scene Images ...
BASE
Show details
9
Exercise mode and executive function in older adults: An ERP study of task-switching
In: Brain and cognition. - San Diego, Calif. [u.a.] : Elsevier Science 83 (2013) 2, 153-162
OLC Linguistik
Show details
10
Women Leaders of Higher Education: Female Executives in Leading Universities in China
In: Cross-Cultural Communication; Vol 9, No 6 (2013): Cross-Cultural Communication; 40-45 ; 1923-6700 ; 1712-8358 (2013)
BASE
Show details
11
Continuous F0 modeling for HMM based statistical parametric speech synthesis
In: Institute of Electrical and Electronics Engineers. IEEE transactions on audio, speech and language processing. - New York, NY : Inst. 19 (2011) 5, 1071-1079
BLLDB
OLC Linguistik
Show details
12
Context adaptive training with factorized decision trees for HMM-based statistical parametric speech synthesis
In: Speech communication. - Amsterdam [u.a.] : Elsevier 53 (2011) 6, 914-923
BLLDB
OLC Linguistik
Show details
13
Context adaptive training with factorized decision trees for HMM-based statistical parametric speech synthesis
In: ISSN: 0167-6393 ; EISSN: 1872-7182 ; Speech Communication ; https://hal.archives-ouvertes.fr/hal-00746106 ; Speech Communication, Elsevier : North-Holland, 2011, ⟨10.1016/j.specom.2011.03.003⟩ (2011)
BASE
Show details
14
The hidden information state model: a practical framework for POMDP-based spoken dialogue management
In: Computer speech and language. - Amsterdam [u.a.] : Elsevier 24 (2010) 2, 150-174
BLLDB
OLC Linguistik
Show details
15
Unsupervised training and directed manual transcription for LVCSR
In: Speech communication. - Amsterdam [u.a.] : Elsevier 52 (2010) 7, 652-663
BLLDB
OLC Linguistik
Show details
16
Phrase-based statistical language generation using graphical models and active learning
In: Association for Computational Linguistics. Proceedings of the conference. - Stroudsburg, Penn. : ACL 48 (2010) 2, 1552-1561
BLLDB
Show details
17
Bayesian adaptive inference and adaptive training
In: Institute of Electrical and Electronics Engineers. IEEE transactions on audio, speech and language processing. - New York, NY : Inst. 15 (2007) 6, 1932-1943
BLLDB
OLC Linguistik
Show details
18
Discriminative cluster adaptive training
In: Institute of Electrical and Electronics Engineers. IEEE transactions on audio, speech and language processing. - New York, NY : Inst. 14 (2006) 5, 1694-1703
BLLDB
OLC Linguistik
Show details
19
RECOGNITION OF SYLLABLE-CONTRACTED WORDS IN SPONTANEOUS SPEECH USING WORD EXPANSION AND DURATION INFORMATION
In: http://isca-speech.org/archive_open/archive_papers/iscslp2008/225.pdf
BASE
Show details

Catalogues
0
0
7
0
0
0
0
Bibliographies
7
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
11
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern