1 |
Repairing Swedish Automatic Speech Recognition ; Korrigering av Automatisk Taligenkänning för Svenska
|
|
Rehn, Karla. - : KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021
|
|
Abstract:
The quality of automatic speech recognition has increased dramatically the last few years, but the performance for low and middle resource languages such as Swedish is still far from optimal. In this project a language model trained on large written corpora called KB-BERT is utilized to improve the quality of transcriptions for Swedish. The large language model is inserted as a repairing module after the automatic speech recognition, aiming to repair the original output into a transcription more closely resembling the ground truth by using a sequence to sequence translating approach. Two automatic speech recognition models are used to transcribe the speech, one of the models are developed in this project using the Kaldi framework, the other model is Microsoft’s Azure Speech to text platform. The performance of the translator is evaluated with four different datasets, three consisting of read speech and one of spontaneous speech. The spontaneous speech and one of the read datasets include both native and non-native speakers. The performance is measured by three different metrics, word error rate, a weighted word error rate and a semantic similarity. The repairs improve the transcriptions of two of the read speech datasets significantly, decreasing the word error rate from 13.69% to 3.05% and from 36.23% to 21.17%. The repairs improve the word error rate from 44.38% to 44.06% on the data with spontaneous speech, and fail on the last read dataset, instead increasing the word error rate. The lower performance on the latter is likely due to lack of data. ; Automatisk taligenkänning har förbättrats de senaste åren, men för små språk såsom svenska är prestandan fortfarande långt ifrån optimal. Det här projektet använder KB-BERT, en neural språkmodell tränad på stora mängder skriven text, för att förbättra kvalitén på transkriptioner av svenskt tal. Transkriptionerna kommer från två olika taligenkänningsmodeller, dels en utvecklad i det här projektet med hjälp av mjukvarubiblioteket Kaldi, dels Microsoft Azures plattform för tal till text. Transkriptionerna repareras med hjälp av en sequence-to-sequence översättningsmodell, och KB-BERT används för att initiera modellen. Översättningen sker från den urpsrungliga transkriptionen från en av tal-till-text-modellerna till en transkription som är mer lik den korrekta, faktiska transkriptionen. Kvalitéen på reparationerna evalueras med tre olika metriker, på fyra olika dataset. Tre av dataseten är läst tal och det fjärde spontant, och det spontana talet samt ett av de lästa dataseten kommer både från talare som har svenska som modersmål, och talare som har det som andraspråk. De tre metrikerna är word error rate, en viktad word error rate, samt ett mått på semantisk likhet. Reparationerna förbättrar transkriptionerna från två av de lästa dataseten markant, och sänker word error rate från 13.69% till 3.05% och från 36.23% till 21.17%. På det spontana talet sänks word error rate från 44.38% till 44.06%. Reparationerna misslyckas på det fjärde datasetet, troligen på grund av dess lilla storlek.
|
|
Keyword:
ASR Repair; Automatic speech recognition; Automatisk taligenkänning; Computer and Information Sciences; Data- och informationsvetenskap; Dialogsystem; Dialogue systems; Language models; Reparation av taligenkänning; Språkmodeller
|
|
URL: http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-305922
|
|
BASE
|
|
Hide details
|
|
2 |
Developing discourse structure analysis for use on conversations that include people with aphasia
|
|
|
|
In: http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1594159643173734 (2020)
|
|
BASE
|
|
Show details
|
|
3 |
Neural mechanisms for monitoring and halting of spoken word production
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Conversational trouble and repair in dementia: revision of an existing coding framework
|
|
|
|
BASE
|
|
Show details
|
|
5 |
OTHER-INITIATED SELF-REPAIRS IN STUDENT-STUDENT INTERACTION: THE FREQUENCY OF OCCURRENCE AND MECHANISM
|
|
|
|
In: JEELS (Journal of English Education and Linguistics Studies), Vol 6, Iss 1, Pp 91-110 (2019) (2019)
|
|
BASE
|
|
Show details
|
|
6 |
The role of L2 experience in L1 phonotactic restructuring in sequential bilinguals ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
The role of L2 experience in L1 phonotactic restructuring in sequential bilinguals
|
|
|
|
BASE
|
|
Show details
|
|
8 |
The Use of Gesture in Self-Initiated Self-Repair Sequences by Persons with Non-Fluent Aphasia
|
|
|
|
In: Theses and Dissertations--Linguistics (2016)
|
|
BASE
|
|
Show details
|
|
9 |
Conversation breakdowns in the audiology clinic: the importance of mutual gaze
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Who said what? Sampling conversation repair behavior involving adults with acquired hearing impairment
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Conversational Repair Strategies in Adolescents with Autism Spectrum Disorders
|
|
|
|
In: http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1225745290 (2008)
|
|
BASE
|
|
Show details
|
|
12 |
Dental-to-velar perceptual assimilation: A cross-linguistic study of the perception of dental stop+/l/ clusters
|
|
|
|
In: https://halshs.archives-ouvertes.fr/halshs-00129735 ; 2007 (2007)
|
|
BASE
|
|
Show details
|
|
13 |
Conversation repair and adult cochlear implantation: A qualitative case study
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Exchange of disfluency with age from function words to content words in Spanish speakers who stutter
|
|
|
|
In: J SPEECH LANG HEAR R , 46 (3) 754 - 765. (2003) (2003)
|
|
BASE
|
|
Show details
|
|
15 |
Interactive Electronic Technical Manuals (IETMs) Annotated Bibliography
|
|
|
|
In: DTIC (2002)
|
|
BASE
|
|
Show details
|
|
17 |
Utterance rate and linguistic properties as determinants of lexical dysfluencies in children who stutter
|
|
|
|
In: J ACOUST SOC AM , 105 (1) 481 - 490. (1999) (1999)
|
|
BASE
|
|
Show details
|
|
18 |
Putting People First: Specifying Proper Names in Speech Interfaces
|
|
|
|
In: http://www.media.mit.edu/speech/papers/1994/marx_UIST94_putting_people_first.ps.gz (1994)
|
|
BASE
|
|
Show details
|
|
19 |
Detection and Correction of Repairs in Human-Computer Dialog
|
|
|
|
In: DTIC (1992)
|
|
BASE
|
|
Show details
|
|
20 |
Research in Knowledge Representation for Natural Language Communication and Planning Assistance
|
|
|
|
In: DTIC AND NTIS (1988)
|
|
BASE
|
|
Show details
|
|
|
|