DE eng

Search in the Catalogues and Directories

Page: 1 2
Hits 1 – 20 of 22

1
Repairing Swedish Automatic Speech Recognition ; Korrigering av Automatisk Taligenkänning för Svenska
Rehn, Karla. - : KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021
Abstract: The quality of automatic speech recognition has increased dramatically the last few years, but the performance for low and middle resource languages such as Swedish is still far from optimal. In this project a language model trained on large written corpora called KB-BERT is utilized to improve the quality of transcriptions for Swedish. The large language model is inserted as a repairing module after the automatic speech recognition, aiming to repair the original output into a transcription more closely resembling the ground truth by using a sequence to sequence translating approach. Two automatic speech recognition models are used to transcribe the speech, one of the models are developed in this project using the Kaldi framework, the other model is Microsoft’s Azure Speech to text platform. The performance of the translator is evaluated with four different datasets, three consisting of read speech and one of spontaneous speech. The spontaneous speech and one of the read datasets include both native and non-native speakers. The performance is measured by three different metrics, word error rate, a weighted word error rate and a semantic similarity. The repairs improve the transcriptions of two of the read speech datasets significantly, decreasing the word error rate from 13.69% to 3.05% and from 36.23% to 21.17%. The repairs improve the word error rate from 44.38% to 44.06% on the data with spontaneous speech, and fail on the last read dataset, instead increasing the word error rate. The lower performance on the latter is likely due to lack of data. ; Automatisk taligenkänning har förbättrats de senaste åren, men för små språk såsom svenska är prestandan fortfarande långt ifrån optimal. Det här projektet använder KB-BERT, en neural språkmodell tränad på stora mängder skriven text, för att förbättra kvalitén på transkriptioner av svenskt tal. Transkriptionerna kommer från två olika taligenkänningsmodeller, dels en utvecklad i det här projektet med hjälp av mjukvarubiblioteket Kaldi, dels Microsoft Azures plattform för tal till text. Transkriptionerna repareras med hjälp av en sequence-to-sequence översättningsmodell, och KB-BERT används för att initiera modellen. Översättningen sker från den urpsrungliga transkriptionen från en av tal-till-text-modellerna till en transkription som är mer lik den korrekta, faktiska transkriptionen. Kvalitéen på reparationerna evalueras med tre olika metriker, på fyra olika dataset. Tre av dataseten är läst tal och det fjärde spontant, och det spontana talet samt ett av de lästa dataseten kommer både från talare som har svenska som modersmål, och talare som har det som andraspråk. De tre metrikerna är word error rate, en viktad word error rate, samt ett mått på semantisk likhet. Reparationerna förbättrar transkriptionerna från två av de lästa dataseten markant, och sänker word error rate från 13.69% till 3.05% och från 36.23% till 21.17%. På det spontana talet sänks word error rate från 44.38% till 44.06%. Reparationerna misslyckas på det fjärde datasetet, troligen på grund av dess lilla storlek.
Keyword: ASR Repair; Automatic speech recognition; Automatisk taligenkänning; Computer and Information Sciences; Data- och informationsvetenskap; Dialogsystem; Dialogue systems; Language models; Reparation av taligenkänning; Språkmodeller
URL: http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-305922
BASE
Hide details
2
Developing discourse structure analysis for use on conversations that include people with aphasia
In: http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1594159643173734 (2020)
BASE
Show details
3
Neural mechanisms for monitoring and halting of spoken word production
BASE
Show details
4
Conversational trouble and repair in dementia: revision of an existing coding framework
BASE
Show details
5
OTHER-INITIATED SELF-REPAIRS IN STUDENT-STUDENT INTERACTION: THE FREQUENCY OF OCCURRENCE AND MECHANISM
In: JEELS (Journal of English Education and Linguistics Studies), Vol 6, Iss 1, Pp 91-110 (2019) (2019)
BASE
Show details
6
The role of L2 experience in L1 phonotactic restructuring in sequential bilinguals ...
Alcorn, Steven Michael; 0000-0002-3199-1826. - : The University of Texas at Austin, 2018
BASE
Show details
7
The role of L2 experience in L1 phonotactic restructuring in sequential bilinguals
BASE
Show details
8
The Use of Gesture in Self-Initiated Self-Repair Sequences by Persons with Non-Fluent Aphasia
In: Theses and Dissertations--Linguistics (2016)
BASE
Show details
9
Conversation breakdowns in the audiology clinic: the importance of mutual gaze
Ekberg, Katie; Hickson, Louise; Grenness, Caitlin. - : John Wiley & Sons, 2016
BASE
Show details
10
Who said what? Sampling conversation repair behavior involving adults with acquired hearing impairment
Lind, Christopher; Hickson, Louise; Erber, Norman. - : Thieme Medical Publishers, 2010
BASE
Show details
11
Conversational Repair Strategies in Adolescents with Autism Spectrum Disorders
In: http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1225745290 (2008)
BASE
Show details
12
Dental-to-velar perceptual assimilation: A cross-linguistic study of the perception of dental stop+/l/ clusters
In: https://halshs.archives-ouvertes.fr/halshs-00129735 ; 2007 (2007)
BASE
Show details
13
Conversation repair and adult cochlear implantation: A qualitative case study
Lind, C.; Hickson, L. M. H.; Erber, N.. - : John Wiley & Sons, 2006
BASE
Show details
14
Exchange of disfluency with age from function words to content words in Spanish speakers who stutter
In: J SPEECH LANG HEAR R , 46 (3) 754 - 765. (2003) (2003)
BASE
Show details
15
Interactive Electronic Technical Manuals (IETMs) Annotated Bibliography
In: DTIC (2002)
BASE
Show details
16
Phonetic Consequences of Speech Disfluency
In: DTIC (1999)
BASE
Show details
17
Utterance rate and linguistic properties as determinants of lexical dysfluencies in children who stutter
In: J ACOUST SOC AM , 105 (1) 481 - 490. (1999) (1999)
BASE
Show details
18
Putting People First: Specifying Proper Names in Speech Interfaces
In: http://www.media.mit.edu/speech/papers/1994/marx_UIST94_putting_people_first.ps.gz (1994)
BASE
Show details
19
Detection and Correction of Repairs in Human-Computer Dialog
In: DTIC (1992)
BASE
Show details
20
Research in Knowledge Representation for Natural Language Communication and Planning Assistance
In: DTIC AND NTIS (1988)
BASE
Show details

Page: 1 2

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
22
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern