1 |
Prosodic Boundary Prediction Model for Vietnamese Text-To-Speech
|
|
|
|
In: Proc. Interspeech 2021 ; Interspeech 2021 ; https://hal.archives-ouvertes.fr/hal-03329116 ; Interspeech 2021, Aug 2021, Brno, Czech Republic. pp.3885-3889, ⟨10.21437/interspeech.2021-125⟩ (2021)
|
|
Abstract:
International audience ; This research aims to build a prosodic boundary prediction model for improving the naturalness of Vietnamese speech synthesis. This model can be used directly to predict prosodic boundaries in the synthesis phase of the statistical parametric or end-to-end speech systems. Beside conventional features related to Part-Of-Speech (POS), this paper proposes two efficient features to predict prosodic boundaries: syntactic blocks and syntactic links, based on a thorough analysis of a Vietnamese dataset. Syntactic blocks are syntactic phrases whose sizes are bounded in their constituent syntactic tree. A syntactic link of two adjacent words is calculated based on the distance between them in the syntax tree. The experimental results show that the two proposed predictors improve the quality of the boundary prediction model using a decision tree classification algorithm, about 36.4% (F1 score) higher than the model with only POS features. The final boundary prediction model with POS, syntactic block, and syntactic link features using the LightGBM algorithm gives the best F1-score results at 87.0% in test data. The proposed model helps the TTS systems, developed by either HMM-based, DNN-based, or End-to-end speech synthesis techniques, improve about 0.3 MOS points (i.e. 6 to 10%) compared to the ones without the proposed model.
|
|
Keyword:
[INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC]; [INFO.INFO-SD]Computer Science [cs]/Sound [cs.SD]; [INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing; [SHS.LANGUE]Humanities and Social Sciences/Linguistics; pause prediction; prosodic boundary; Prosody modeling; speech synthesis; Text-To-Speech; Vietnamese
|
|
URL: https://hal.archives-ouvertes.fr/hal-03329116/file/trang21_interspeech.pdf https://hal.archives-ouvertes.fr/hal-03329116 https://hal.archives-ouvertes.fr/hal-03329116/document https://doi.org/10.21437/interspeech.2021-125
|
|
BASE
|
|
Hide details
|
|
2 |
Prosodic Disambiguation Using Chironomic Stylization of Intonation with Native and Non-Native Speakers
|
|
|
|
In: Proceedings Interspeech 2021 ; Interspeech 2021 ; https://hal.archives-ouvertes.fr/hal-03329111 ; Interspeech 2021, Aug 2021, Brno (virtual), Czech Republic. pp.516-520, ⟨10.21437/Interspeech.2021-182⟩ (2021)
|
|
BASE
|
|
Show details
|
|
3 |
Perceptual equivalence of the Liljencrants-Fant and linear-filter glottal flow models
|
|
|
|
In: ISSN: 0001-4966 ; EISSN: 1520-8524 ; Journal of the Acoustical Society of America ; https://hal.archives-ouvertes.fr/hal-03322875 ; Journal of the Acoustical Society of America, Acoustical Society of America, 2021, 150 (2), pp.1273-1285. ⟨10.1121/10.0005879⟩ ; https://doi.org/10.1121/10.0005879 (2021)
|
|
BASE
|
|
Show details
|
|
4 |
Voks: Digital instruments for chironomic control of voice samples
|
|
|
|
In: ISSN: 0167-6393 ; EISSN: 1872-7182 ; Speech Communication ; https://hal.archives-ouvertes.fr/hal-03009712 ; Speech Communication, Elsevier : North-Holland, 2020, 125, pp.97 - 113. ⟨10.1016/j.specom.2020.10.002⟩ (2020)
|
|
BASE
|
|
Show details
|
|
5 |
Les instruments chanteurs
|
|
|
|
In: ISSN: 1263-8072 ; Acoustique et Techniques : trimestriel d'information des professionnels de l'acoustique ; https://hal.archives-ouvertes.fr/hal-02025861 ; Acoustique et Techniques : trimestriel d'information des professionnels de l'acoustique, Neuilly-sur-Seine : Centre d'information et de documentation sur le bruit, 2019, 89, pp.36-43 (2019)
|
|
BASE
|
|
Show details
|
|
6 |
T-Voks: the Singing and Speaking Theremin
|
|
|
|
In: NIME 2019 International Conference on New Interfaces for Musical Expression ; https://hal.archives-ouvertes.fr/hal-02197063 ; NIME 2019 International Conference on New Interfaces for Musical Expression, UFRGS, Jun 2019, Porto Alegre, Brazil. pp.110-115 ; http://www.nime.org/proceedings/2019/nime2019_022.pdf (2019)
|
|
BASE
|
|
Show details
|
|
10 |
Paradigmatic variation of vowels in expressive speech: Acoustic description and dimensional analysis
|
|
|
|
In: ISSN: 0001-4966 ; EISSN: 1520-8524 ; Journal of the Acoustical Society of America ; https://hal.archives-ouvertes.fr/hal-01914497 ; Journal of the Acoustical Society of America, Acoustical Society of America, 2018, 143, pp.109-122. ⟨10.1121/1.5018433⟩ (2018)
|
|
BASE
|
|
Show details
|
|
11 |
Jouer avec les doubles artificiels de la voix: Cantor digitalis et Vokinesis.Conférence-concert
|
|
|
|
In: La voix à double tranchant ; https://hal.archives-ouvertes.fr/hal-02009009 ; La voix à double tranchant, Solipsy, pp.185-203, 2018, Voix et psychanalyse 2017, 978-2-84932-106-5 (2018)
|
|
BASE
|
|
Show details
|
|
12 |
Le contrôle des instruments chanteurs
|
|
|
|
In: Congrès Français d’Acoustique, CFA 2018 ; https://hal.archives-ouvertes.fr/hal-02008980 ; Congrès Français d’Acoustique, CFA 2018, Apr 2018, Le Havre, France. pp.1249-1255 (2018)
|
|
BASE
|
|
Show details
|
|
14 |
Adjusting the Frame: Biphasic Performative Control of Speech Rhythm
|
|
|
|
In: Proceedings of Interspeech 2017 ; https://hal.sorbonne-universite.fr/hal-01672232 ; Proceedings of Interspeech 2017, Aug 2017, Stockholm, Sweden. pp.864-868 (2017)
|
|
BASE
|
|
Show details
|
|
15 |
Vokinesis: Syllabic Control Points For Performative Singing Synthesis ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Vokinesis: Syllabic Control Points For Performative Singing Synthesis ...
|
|
|
|
BASE
|
|
Show details
|
|
17 |
Vocal effort modification for singing synthesis
|
|
|
|
In: INTERSPEECH 2016 ; Annual Conference of the International Speech Communication Association (INTERSPEECH 2016) ; https://hal.archives-ouvertes.fr/hal-01712564 ; Annual Conference of the International Speech Communication Association (INTERSPEECH 2016), Sep 2016, San Francisco, United States. pp.1235-1239, ⟨10.21437/Interspeech.2016-1096⟩ (2016)
|
|
BASE
|
|
Show details
|
|
18 |
Emergence de la vocalité : la glossolalie et le musical
|
|
|
|
In: colloque L’émergence en musique : dialogue des sciences ; https://hal.archives-ouvertes.fr/hal-01712656 ; colloque L’émergence en musique : dialogue des sciences, 2016, Université de Versailles Saint Quentin, France (2016)
|
|
BASE
|
|
Show details
|
|
19 |
Seeing, listening, drawing: interferences between sensorimotor modalities in the use of a tablet musical interface
|
|
|
|
In: ISSN: 1544-3558 ; ACM Transactions on Applied Perception ; https://hal.sorbonne-universite.fr/hal-01672241 ; ACM Transactions on Applied Perception, Association for Computing Machinery, 2016, 14 (2), pp.1 - 19. ⟨10.1145/2990501⟩ (2016)
|
|
BASE
|
|
Show details
|
|
20 |
Target Acquisition vs. Expressive Motion: Dynamic Pitch Warping for Intonation Correction
|
|
|
|
In: ISSN: 1073-0516 ; EISSN: 1557-7325 ; ACM Transactions on Computer-Human Interaction ; https://hal.sorbonne-universite.fr/hal-01672238 ; ACM Transactions on Computer-Human Interaction, Association for Computing Machinery, 2016, 23 (3), pp.17. ⟨10.1145/2897513⟩ (2016)
|
|
BASE
|
|
Show details
|
|
|
|