1 |
Some consideration on expressive audiovisual speech corpus acquisition using a multimodal platform
|
|
|
|
In: ISSN: 1574-020X ; EISSN: 1574-0218 ; Language Resources and Evaluation ; https://hal.archives-ouvertes.fr/hal-02907046 ; Language Resources and Evaluation, Springer Verlag, 2020, ⟨10.1007/s10579-020-09500-w⟩ ; https://link.springer.com/article/10.1007%2Fs10579-020-09500-w (2020)
|
|
BASE
|
|
Show details
|
|
2 |
Can we Generate Emotional Pronunciations for Expressive Speech Synthesis?
|
|
|
|
In: ISSN: 1949-3045 ; IEEE Transactions on Affective Computing ; https://hal.archives-ouvertes.fr/hal-01802463 ; IEEE Transactions on Affective Computing, Institute of Electrical and Electronics Engineers, 2020, 11 (4), pp.684-695. ⟨10.1109/TAFFC.2018.2828429⟩ (2020)
|
|
BASE
|
|
Show details
|
|
3 |
Introducing Prosodic Speaker Identity for a Better Expressive Speech Synthesis Control
|
|
|
|
In: 10th International Conference on Speech Prosody 2020 ; https://hal.archives-ouvertes.fr/hal-03000148 ; 10th International Conference on Speech Prosody 2020, May 2020, Tokyo, Japan. pp.935-939, ⟨10.21437/speechprosody.2020-191⟩ (2020)
|
|
BASE
|
|
Show details
|
|
4 |
Communicating with robots: What we do wrong and what we do right in artificial social intelligence, and what we need to do better
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Characterizing the Expressivity of Game Description Languages
|
|
|
|
In: PRICAI 2019: Trends in Artificial Intelligence: 16th Pacific Rim International Conference on Artificial Intelligence, Cuvu, Yanuca Island, Fiji, August 26–30, 2019, Proceedings ; https://hal.archives-ouvertes.fr/hal-03594792 ; PRICAI 2019: Trends in Artificial Intelligence: 16th Pacific Rim International Conference on Artificial Intelligence, Cuvu, Yanuca Island, Fiji, August 26–30, 2019, Proceedings, 11670, Springer International Publishing, pp.597-611, 2019, Lecture Notes in Computer Science book series, 978-3-030-29907-1. ⟨10.1007/978-3-030-29908-8_47⟩ ; https://link.springer.com/chapter/10.1007/978-3-030-29908-8_47 (2019)
|
|
BASE
|
|
Show details
|
|
6 |
Conditional Variational Auto-Encoder for Text-Driven Expressive AudioVisual Speech Synthesis
|
|
|
|
In: INTERSPEECH 2019 - 20th Annual Conference of the International Speech Communication Association ; https://hal.inria.fr/hal-02175776 ; INTERSPEECH 2019 - 20th Annual Conference of the International Speech Communication Association, Sep 2019, Graz, Austria (2019)
|
|
BASE
|
|
Show details
|
|
7 |
Discourse phrases classification: direct vs. narrative audio speech
|
|
|
|
In: Speech Prosody ; https://hal.archives-ouvertes.fr/hal-01790910 ; Speech Prosody, Jun 2018, Poznan, Poland (2018)
|
|
BASE
|
|
Show details
|
|
8 |
Pronunciation and disfluency modeling for expressive speech synthesis ; Modélisation de la prononciation et des disfluences pour la synthèse de la parole expressive
|
|
|
|
In: https://hal.inria.fr/tel-01668014 ; Artificial Intelligence [cs.AI]. Université Rennes 1, 2017. English. ⟨NNT : 2017REN1S076⟩ (2017)
|
|
BASE
|
|
Show details
|
|
9 |
Perception of expressivity in TTS: linguistics, phonetics or prosody?
|
|
|
|
In: Statistical Language and Speech Processing ; https://hal-univ-lemans.archives-ouvertes.fr/hal-01623916 ; Statistical Language and Speech Processing, Oct 2017, Le Mans, France. pp.262-274, ⟨10.1007/978-3-319-68456-7_22⟩ ; http://grammars.grlmc.com/SLSP2017/index.php (2017)
|
|
Abstract:
International audience ; Actually a lot of work on expressive speech focus on acoustic models and prosody variations. However, in expressive Text-to-Speech (TTS) systems, prosody generation strongly relies on the sequence of phonemes to be expressed and also to the words below these phonemes. Consequently, linguistic and phonetic cues play a significant role in the perception of expressivity. In previous works, we proposed a statistical corpus-specific framework which adapts phonemes derived from an automatic phonetizer to the phonemes as labelled in the TTS speech corpus. This framework allows to synthesize good quality but neutral speech samples. The present study goes further in the generation of expressive speech by predicting not only corpus-specific but also expressive pronunciation. It also investigates the shared impacts of linguistics, phonetics and prosody, these impacts being evaluated through different French neutral and expressive speech collected with different speaking styles and linguistic content and expressed under diverse emotional states. Perception tests show that expressivity is more easily perceived when linguistics , phonetics and prosody are consistent. Linguistics seems to be the strongest cue in the perception of expressivity, but phonetics greatly improves expressiveness when combined with and adequate prosody.
|
|
Keyword:
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; ACM: I.: Computing Methodologies/I.2: ARTIFICIAL INTELLIGENCE/I.2.7: Natural Language Processing/I.2.7.5: Speech recognition and synthesis; Expressive speech synthesis; Linguistics; Perception; Phonetics -phonology; Pronunciation adaptation; Prosody
|
|
URL: https://hal-univ-lemans.archives-ouvertes.fr/hal-01623916v3/document https://hal-univ-lemans.archives-ouvertes.fr/hal-01623916v3/file/SLSP2017_Tahon_final.pdf https://doi.org/10.1007/978-3-319-68456-7_22 https://hal-univ-lemans.archives-ouvertes.fr/hal-01623916
|
|
BASE
|
|
Hide details
|
|
10 |
IMMERSE: Interactive Mentoring for Multimodal Experiences in Realistic Social Encounters
|
|
|
|
In: DTIC (2015)
|
|
BASE
|
|
Show details
|
|
|
|