DE eng

Search in the Catalogues and Directories

Hits 1 – 7 of 7

1
Speaker reliability effect on adult cross-situational word learning ...
Rivera-Vera, Natalia. - : Open Science Framework, 2021
BASE
Show details
2
Integrating a voice analysis-synthesis system with a TTS framework for controlling affect and speaker identity ; 2021 32nd Irish Signals and Systems Conference (ISSC)
BASE
Show details
3
ПЕРЦЕПТИВНЫЕ ХАРАКТЕРИСТИКИ БЕЗУДАРНЫХ РЕДУЦИРОВАННЫХ АНГЛИЙСКИХ ГЛАСНЫХ
АНДРОСОВА СВЕТЛАНА ВИКТОРОВНА; ГУСЕВА СВЕТЛАНА ИВАНОВНА; МОРОЗОВА ОЛЬГА НИКОЛАЕВНА. - : Общество с ограниченной ответственностью Издательство Грамота, 2015
BASE
Show details
4
The impact of rater characteristics on oral assessments of second language proficiency
Su, Yi-Wen. - 2014
BASE
Show details
5
Accounting for Individual Speaker Properties in Automatic Speech Recognition
Elenius, Daniel. - : KTH, Tal-kommunikation, 2010. : Stockholm : KTH Royal Institute of Technology, 2010
Abstract: In this work, speaker characteristic modeling has been applied in the fields of automatic speech recognition (ASR) and automatic speaker verification (ASV). In ASR, a key problem is that acoustic mismatch between training and test conditions degrade classification per- formance. In this work, a child exemplifies a speaker not represented in training data and methods to reduce the spectral mismatch are devised and evaluated. To reduce the acoustic mismatch, predictive modeling based on spectral speech transformation is applied. Follow- ing this approach, a model suitable for a target speaker, not well represented in the training data, is estimated and synthesized by applying vocal tract predictive modeling (VTPM). In this thesis, the traditional static modeling on the utterance level is extended to dynamic modeling. This is accomplished by operating also on sub-utterance units, such as phonemes, phone-realizations, sub-phone realizations and sound frames. Initial experiments shows that adaptation of an acoustic model trained on adult speech significantly reduced the word error rate of ASR for children, but not to the level of a model trained on children’s speech. Multi-speaker-group training provided an acoustic model that performed recognition for both adults and children within the same model at almost the same accuracy as speaker-group dedicated models, with no added model complexity. In the analysis of the cause of errors, body height of the child was shown to be correlated to word error rate. A further result is that the computationally demanding iterative recognition process in standard VTLN can be replaced by synthetically extending the vocal tract length distribution in the training data. A multi-warp model is trained on the extended data and recognition is performed in a single pass. The accuracy is similar to that of the standard technique. A concluding experiment in ASR shows that the word error rate can be reduced by ex- tending a static vocal tract length compensation parameter into a temporal parameter track. A key component to reach this improvement was provided by a novel joint two-level opti- mization process. In the process, the track was determined as a composition of a static and a dynamic component, which were simultaneously optimized on the utterance and sub- utterance level respectively. This had the principal advantage of limiting the modulation am- plitude of the track to what is realistic for an individual speaker. The recognition error rate was reduced by 10% relative compared with that of a standard utterance-specific estimation technique. The techniques devised and evaluated can also be applied to other speaker characteristic properties, which exhibit a dynamic nature. An excursion into ASV led to the proposal of a statistical speaker population model. The model represents an alternative approach for determining the reject/accept threshold in an ASV system instead of the commonly used direct estimation on a set of client and impos- tor utterances. This is especially valuable in applications where a low false reject or false ac- cept rate is required. In these cases, the number of errors is often too few to estimate a reli- able threshold using the direct method. The results are encouraging but need to be verified on a larger database. ; QC 20110502 ; Pf-Star ; KOBRA
Keyword: child; dynamic modeling; Language Technology (Computational Linguistics); MAP; MLLR; speaker characteristics; Språkteknologi (språkvetenskaplig databehandling); VTLN
URL: http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-12258
BASE
Hide details
6
Importancia económica de las características fonéticas del idioma español y sus variedades regionales
Coloma, Germán. - : Buenos Aires: Universidad del Centro de Estudios Macroeconómicos de Argentina (UCEMA), 2010
BASE
Show details
7
Speech Communication
Klatt, Dennis H.; Vaissière, Jacqueline; Henke, William L.. - : Research Laboratory of Electronics (RLE) at the Massachusetts Institute of Technology (MIT), 1975
BASE
Show details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
7
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern