1 |
Human-in-the-Loop Efficiency Analysis for Binary Classification in Edyson
|
|
|
|
BASE
|
|
Show details
|
|
2 |
Self-perceived preferences of voice and speaking style characteristics in spoken text
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Simulating Speech Error Patterns Across Languages and Different Datasets
|
|
|
|
In: Lang Speech (2021)
|
|
BASE
|
|
Show details
|
|
4 |
Augmented Prompt Selection for Evaluation of Spontaneous Speech Synthesis
|
|
|
|
Abstract:
By definition, spontaneous speech is unscripted and created on the fly by the speaker. It is dramatically different from read speech, where the words are authored as text before they are spoken. Spontaneous speech is emergent and transient, whereas text read out loud is pre-planned. For this reason, it is unsuitable to evaluate the usability and appropriateness of spontaneous speech synthesis by having it read out written texts sampled from for example newspapers or books. Instead, we need to use transcriptions of speech as the target - something that is much less readily available. In this paper, we introduce Starmap, a tool allowing developers to select a varied, representative set of utterances from a spoken genre, to be used for evaluation of TTS for a given domain. The selection can be done from any speech recording, without the need for transcription. The tool uses interactive visualisation of prosodic features with t-SNE, along with a tree-based algorithm to guide the user through thousands of utterances and ensure coverage of a variety of prompts. A listening test has shown that with a selection of genre-specific utterances, it is possible to show significant differences across genres between two synthetic voices built from spontaneous speech. ; QC 20201020
|
|
Keyword:
Engineering and Technology; evaluation; human-in-the-loop; intelligence augmentation; spontaneous speech synthesis; Teknik och teknologier
|
|
URL: http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-283733
|
|
BASE
|
|
Hide details
|
|
5 |
Syllabification of conversational speech using bidirectional long-short-term memory neural networks
|
|
|
|
BASE
|
|
Show details
|
|
6 |
400 voices in a jiffy: a verification of the Cocktail experiment platform ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
400 voices in a jiffy: a verification of the Cocktail experiment platform ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Preliminary guidelines for the efficient management of OOV words for spoken text
|
|
|
|
BASE
|
|
Show details
|
|
10 |
The State of Speech in HCI: Trends, Themes and Challenges ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Towards Metadata Descriptions for Multimodal Corpora of Natural Communication Data
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Gesture movement profiles in dialogues from a Swedish multimodal database of spontaneous speech
|
|
|
|
In: Prosody and embodiment in interactional grammar (2012)
|
|
IDS Mannheim
|
|
|
|