DE eng

Search in the Catalogues and Directories

Page: 1 2 3 4 5 6
Hits 1 – 20 of 119

1
Learning disentangled speech representations
Williams, Jennifer. - : The University of Edinburgh, 2022
BASE
Show details
2
An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep Learning ...
BASE
Show details
3
Persuasive synthetic speech : voice perception and user behaviour
BASE
Show details
4
Prosody generation for text-to-speech synthesis
Ronanki, Srikanth. - : The University of Edinburgh, 2019
BASE
Show details
5
Unsupervised neural and Bayesian models for zero-resource speech processing
Kamper, Herman. - : The University of Edinburgh, 2017
BASE
Show details
6
Statistical parametric speech synthesis using conversational data and phenomena
Dall, Rasmus. - : The University of Edinburgh, 2017
BASE
Show details
7
Overcoming the limitations of statistical parametric speech synthesis
Merritt, Thomas. - : The University of Edinburgh, 2017
BASE
Show details
8
Improving Trajectory Modelling for DNN-based Speech Synthesis by using Stacked Bottleneck Features and Minimum Generation Error Training ...
Wu, Zhizheng; King, Simon. - : arXiv, 2016
BASE
Show details
9
DNN-based Speech Synthesis for Indian Languages from ASCII text ...
BASE
Show details
10
Speech segmentation and speaker diarisation for transcription and translation
Sinclair, Mark. - : The University of Edinburgh, 2016
Abstract: This dissertation outlines work related to Speech Segmentation – segmenting an audio recording into regions of speech and non-speech, and Speaker Diarization – further segmenting those regions into those pertaining to homogeneous speakers. Knowing not only what was said but also who said it and when, has many useful applications. As well as providing a richer level of transcription for speech, we will show how such knowledge can improve Automatic Speech Recognition (ASR) system performance and can also benefit downstream Natural Language Processing (NLP) tasks such as machine translation and punctuation restoration. While segmentation and diarization may appear to be relatively simple tasks to describe, in practise we find that they are very challenging and are, in general, ill-defined problems. Therefore, we first provide a formalisation of each of the problems as the sub-division of speech within acoustic space and time. Here, we see that the task can become very difficult when we want to partition this domain into our target classes of speakers, whilst avoiding other classes that reside in the same space, such as phonemes. We present a theoretical framework for describing and discussing the tasks as well as introducing existing state-of-the-art methods and research. Current Speaker Diarization systems are notoriously sensitive to hyper-parameters and lack robustness across datasets. Therefore, we present a method which uses a series of oracle experiments to expose the limitations of current systems and to which system components these limitations can be attributed. We also demonstrate how Diarization Error Rate (DER), the dominant error metric in the literature, is not a comprehensive or reliable indicator of overall performance or of error propagation to subsequent downstream tasks. These results inform our subsequent research. We find that, as a precursor to Speaker Diarization, the task of Speech Segmentation is a crucial first step in the system chain. Current methods typically do not account for the inherent structure of spoken discourse. As such, we explored a novel method which exploits an utterance-duration prior in order to better model the segment distribution of speech. We show how this method improves not only segmentation, but also the performance of subsequent speech recognition, machine translation and speaker diarization systems. Typical ASR transcriptions do not include punctuation and the task of enriching transcriptions with this information is known as ‘punctuation restoration’. The benefit is not only improved readability but also better compatibility with NLP systems that expect sentence-like units such as in conventional machine translation. We show how segmentation and diarization are related tasks that are able to contribute acoustic information that complements existing linguistically-based punctuation approaches. There is a growing demand for speech technology applications in the broadcast media domain. This domain presents many new challenges including diverse noise and recording conditions. We show that the capacity of existing GMM-HMM based speech segmentation systems is limited for such scenarios and present a Deep Neural Network (DNN) based method which offers a more robust speech segmentation method resulting in improved speech recognition performance for a television broadcast dataset. Ultimately, we are able to show that the speech segmentation is an inherently ill-defined problem for which the solution is highly dependent on the downstream task that it is intended for.
Keyword: Speaker Diarization; Speech Activity Detection; speech segmentation
URL: http://hdl.handle.net/1842/20970
BASE
Hide details
11
Intelligibility enhancement of HMM-generated speech in additive noise by modifying Mel cepstral coefficients to increase the glimpse proportion
In: Computer speech and language. - Amsterdam [u.a.] : Elsevier 28 (2014) 2, 665-686
OLC Linguistik
Show details
12
Introduction to the Special Issue on The listening talker: context-dependent speech production and perception
In: Computer speech and language. - Amsterdam [u.a.] : Elsevier 28 (2014) 2, 540-542
OLC Linguistik
Show details
13
The listening talker: A review of human and algorithmic context-induced modifications of speech
In: Computer speech and language. - Amsterdam [u.a.] : Elsevier 28 (2014) 2, 543-571
OLC Linguistik
Show details
14
Statistical parametric speech synthesis for Ibibio
In: Speech communication. - Amsterdam [u.a.] : Elsevier 56 (2014), 243-251
OLC Linguistik
Show details
15
The listening talker: A review of human and algorithmic context-induced modifications of speech
In: ISSN: 0885-2308 ; EISSN: 1095-8363 ; Computer Speech and Language ; https://hal.archives-ouvertes.fr/hal-00874986 ; Computer Speech and Language, Elsevier, 2014, 28 (2), pp.543-571. ⟨10.1016/j.csl.2013.08.003⟩ (2014)
BASE
Show details
16
EUSTACE : Edinburgh University speech timing archive and corpus of English
White, Laurence; King, Simon. - : Centre for Speech Technology Research, University of Edinburgh, 2014
BASE
Show details
17
Measuring a decade of progress in Text-to-Speech ; Evaluando una década de avances en la conversión texto-habla
In: Loquens; Vol. 1 No. 1 (2014); e006 ; Loquens; Vol. 1 Núm. 1 (2014); e006 ; 2386-2637 ; 10.3989/loquens.2014.v1.i1 (2014)
BASE
Show details
18
Feature analysis for discriminative confidence estimation in spoken term detection
BASE
Show details
19
A Comparison of Open-Source Segmentation Architectures for Dealing with Imperfect Data from the Media in Speech Synthesis
Gallardo Antolín, Ascensión; Montero, Juan Manuel; King, Simon. - : International Speech Communication Association, 2014
BASE
Show details
20
The listening talker : a review of human and algorithmic context-induced modifications of speech
Cooke, Martin; King, Simon; Garnier, Maëva. - : U.K., Academic Press, 2014
BASE
Show details

Page: 1 2 3 4 5 6

Catalogues
0
0
24
0
1
0
0
Bibliographies
23
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
84
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern