Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Page: 1 2 3 4 5 6

Hits 1 – 20 of 119

1	Learning disentangled speech representations
	Williams, Jennifer. - : The University of Edinburgh, 2022
	BASE
	Show details

2	An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep Learning ...
	Sisman, Berrak; Yamagishi, Junichi; King, Simon. - : arXiv, 2020
	BASE
	Show details

3	Persuasive synthetic speech : voice perception and user behaviour
	Dubiel, Mateusz; Oplustil Gallegos, Pilar; Halvey, Martin. - 2020
	BASE
	Show details

4	Prosody generation for text-to-speech synthesis
	Ronanki, Srikanth. - : The University of Edinburgh, 2019
	BASE
	Show details

5	Unsupervised neural and Bayesian models for zero-resource speech processing
	Kamper, Herman. - : The University of Edinburgh, 2017
	BASE
	Show details

6	Statistical parametric speech synthesis using conversational data and phenomena
	Dall, Rasmus. - : The University of Edinburgh, 2017
	BASE
	Show details

7	Overcoming the limitations of statistical parametric speech synthesis
	Merritt, Thomas. - : The University of Edinburgh, 2017
	BASE
	Show details

8	Improving Trajectory Modelling for DNN-based Speech Synthesis by using Stacked Bottleneck Features and Minimum Generation Error Training ...
	Wu, Zhizheng; King, Simon. - : arXiv, 2016
	BASE
	Show details

9	DNN-based Speech Synthesis for Indian Languages from ASCII text ...
	Ronanki, Srikanth; Reddy, Siva; Bollepalli, Bajibabu. - : arXiv, 2016
	BASE
	Show details

10	Speech segmentation and speaker diarisation for transcription and translation
	Sinclair, Mark. - : The University of Edinburgh, 2016
	Abstract: This dissertation outlines work related to Speech Segmentation – segmenting an audio recording into regions of speech and non-speech, and Speaker Diarization – further segmenting those regions into those pertaining to homogeneous speakers. Knowing not only what was said but also who said it and when, has many useful applications. As well as providing a richer level of transcription for speech, we will show how such knowledge can improve Automatic Speech Recognition (ASR) system performance and can also benefit downstream Natural Language Processing (NLP) tasks such as machine translation and punctuation restoration. While segmentation and diarization may appear to be relatively simple tasks to describe, in practise we find that they are very challenging and are, in general, ill-defined problems. Therefore, we first provide a formalisation of each of the problems as the sub-division of speech within acoustic space and time. Here, we see that the task can become very difficult when we want to partition this domain into our target classes of speakers, whilst avoiding other classes that reside in the same space, such as phonemes. We present a theoretical framework for describing and discussing the tasks as well as introducing existing state-of-the-art methods and research. Current Speaker Diarization systems are notoriously sensitive to hyper-parameters and lack robustness across datasets. Therefore, we present a method which uses a series of oracle experiments to expose the limitations of current systems and to which system components these limitations can be attributed. We also demonstrate how Diarization Error Rate (DER), the dominant error metric in the literature, is not a comprehensive or reliable indicator of overall performance or of error propagation to subsequent downstream tasks. These results inform our subsequent research. We find that, as a precursor to Speaker Diarization, the task of Speech Segmentation is a crucial first step in the system chain. Current methods typically do not account for the inherent structure of spoken discourse. As such, we explored a novel method which exploits an utterance-duration prior in order to better model the segment distribution of speech. We show how this method improves not only segmentation, but also the performance of subsequent speech recognition, machine translation and speaker diarization systems. Typical ASR transcriptions do not include punctuation and the task of enriching transcriptions with this information is known as ‘punctuation restoration’. The benefit is not only improved readability but also better compatibility with NLP systems that expect sentence-like units such as in conventional machine translation. We show how segmentation and diarization are related tasks that are able to contribute acoustic information that complements existing linguistically-based punctuation approaches. There is a growing demand for speech technology applications in the broadcast media domain. This domain presents many new challenges including diverse noise and recording conditions. We show that the capacity of existing GMM-HMM based speech segmentation systems is limited for such scenarios and present a Deep Neural Network (DNN) based method which offers a more robust speech segmentation method resulting in improved speech recognition performance for a television broadcast dataset. Ultimately, we are able to show that the speech segmentation is an inherently ill-defined problem for which the solution is highly dependent on the downstream task that it is intended for.
	Keyword: Speaker Diarization; Speech Activity Detection; speech segmentation
	URL: http://hdl.handle.net/1842/20970
	BASE
	Hide details

11	Intelligibility enhancement of HMM-generated speech in additive noise by modifying Mel cepstral coefficients to increase the glimpse proportion
	Valentini-Botinhao, Cassia; Yamagishi, Junichi; King, Simon...
	In: Computer speech and language. - Amsterdam [u.a.] : Elsevier 28 (2014) 2, 665-686
	OLC Linguistik
	Show details

12	Introduction to the Special Issue on The listening talker: context-dependent speech production and perception
	Cooke, Martin; King, Simon; Kleijn, Bastiaan...
	In: Computer speech and language. - Amsterdam [u.a.] : Elsevier 28 (2014) 2, 540-542
	OLC Linguistik
	Show details

13	The listening talker: A review of human and algorithmic context-induced modifications of speech
	Cooke, Martin; King, Simon; Garnier, Maëva...
	In: Computer speech and language. - Amsterdam [u.a.] : Elsevier 28 (2014) 2, 543-571
	OLC Linguistik
	Show details

14	Statistical parametric speech synthesis for Ibibio
	Ekpenyong, Moses; Urua, Eno-Abasi; Watts, Oliver...
	In: Speech communication. - Amsterdam [u.a.] : Elsevier 56 (2014), 243-251
	OLC Linguistik
	Show details

15	The listening talker: A review of human and algorithmic context-induced modifications of speech
	Cooke, Martin; King, Simon; Garnier, Maëva...
	In: ISSN: 0885-2308 ; EISSN: 1095-8363 ; Computer Speech and Language ; https://hal.archives-ouvertes.fr/hal-00874986 ; Computer Speech and Language, Elsevier, 2014, 28 (2), pp.543-571. ⟨10.1016/j.csl.2013.08.003⟩ (2014)
	BASE
	Show details

16	EUSTACE : Edinburgh University speech timing archive and corpus of English
	White, Laurence; King, Simon. - : Centre for Speech Technology Research, University of Edinburgh, 2014
	BASE
	Show details

17	Measuring a decade of progress in Text-to-Speech ; Evaluando una década de avances en la conversión texto-habla
	King, Simon
	In: Loquens; Vol. 1 No. 1 (2014); e006 ; Loquens; Vol. 1 Núm. 1 (2014); e006 ; 2386-2637 ; 10.3989/loquens.2014.v1.i1 (2014)
	BASE
	Show details

18	Feature analysis for discriminative confidence estimation in spoken term detection
	Tejedor Noguerales, Javier; Toledano, Doroteo T.; Wang, Dong. - : Elsevier B.V., 2014
	BASE
	Show details

19	A Comparison of Open-Source Segmentation Architectures for Dealing with Imperfect Data from the Media in Speech Synthesis
	Gallardo Antolín, Ascensión; Montero, Juan Manuel; King, Simon. - : International Speech Communication Association, 2014
	BASE
	Show details

20	The listening talker : a review of human and algorithmic context-induced modifications of speech
	Cooke, Martin; King, Simon; Garnier, Maëva. - : U.K., Academic Press, 2014
	BASE
	Show details

Page: 1 2 3 4 5 6

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern