Home Catalogue search

eng

Refine your search:
- Keyword
- Creator / Publisher
- Year:
- Medium:
  - Online (12)
- Type
- BLLDB-Access:
  - free (12)
  - subject to license (0)

Search in the Catalogues and Directories






	Sort by
Simple Search

Hits 1 – 12 of 12

1	Using Data Augmentation and Time-Scale Modification to Improve ASR of Children’s Speech in Noisy Environments
	Hemant Kumar Kathania; Sudarsana Reddy Kadiri; Paavo Alku; Mikko Kurimo
	In: Applied Sciences ; Volume 11 ; Issue 18 (2021)
	Abstract: Current ASR systems show poor performance in recognition of children’s speech in noisy environments because recognizers are typically trained with clean adults’ speech and therefore there are two mismatches between training and testing phases (i.e., clean speech in training vs. noisy speech in testing and adult speech in training vs. child speech in testing). This article studies methods to tackle the effects of these two mismatches in recognition of noisy children’s speech by investigating two techniques: data augmentation and time-scale modification. In the former, clean training data of adult speakers are corrupted with additive noise in order to obtain training data that better correspond to the noisy testing conditions. In the latter, the fundamental frequency (F0) and speaking rate of children’s speech are modified in the testing phase in order to reduce differences in the prosodic characteristics between the testing data of child speakers and the training data of adult speakers. A standard ASR system based on DNN–HMM was built and the effects of data augmentation, F0 modification, and speaking rate modification on word error rate (WER) were evaluated first separately and then by combining all three techniques. The experiments were conducted using children’s speech corrupted with additive noise of four different noise types in four different signal-to-noise (SNR) categories. The results show that the combination of all three techniques yielded the best ASR performance. As an example, the WER value averaged over all four noise types in the SNR category of 5 dB dropped from 32.30% to 12.09% when the baseline system, in which no data augmentation or time-scale modification were used, was replaced with a recognizer that was built using a combination of all three techniques. In summary, in recognizing noisy children’s speech with ASR systems trained with clean adult speech, considerable improvements in the recognition performance can be achieved by combining data augmentation based on noise addition in the system training phase and time-scale modification based on modifying F0 and speaking rate of children’s speech in the testing phase.
	Keyword: data augmentation; DNN; recognition of children’s speech; time-scale modification
	URL: https://doi.org/10.3390/app11188420
	BASE
	Hide details

2	Changes in . . . and Subjective Voice Complaints in Call Center Customer-Service Advisors During One Working Day
	Laura Lehto; Laura Laaksonen; Erkki Vilkman...
	In: http://lib.tkk.fi/Diss/2007/isbn9789512286980/article3.pdf (2008)
	BASE
	Show details

3	Automatic and controlled processing of acoustic and phonetic contrasts
	Elyse Sussman A; Teija Kujala B C; Jaana Halmetoja C...
	In: http://neuroscience.aecom.yu.edu/labs/sussmanlab/Pubs/Sussman_autoandcontr.pdf (2003)
	BASE
	Show details

4	Analysis of speech
	Teemu Rinne; Ca Kimmo Alho; Paavo Alku...
	In: http://spin.ecn.purdue.edu/fmri/PDFLibrary/RinneT_NR_1999_10_1113_1117.pdf
	BASE
	Show details

5	Children Learning a Non-native Vowel – The Effect of a Two-day Production Training
	Laura Taimi; Katri Jähi; Paavo Alku...
	In: http://ojs.academypublisher.com/index.php/jltr/article/viewFile/jltr050612291235/10234/
	BASE
	Show details

6	Group Intervention Changes Brain Activity in Bilingual Language-Impaired Children
	Elina Pihko; Annika Mickos; Teija Kujala...
	In: http://cercor.oxfordjournals.org/content/17/4/849.full.pdf
	BASE
	Show details

7	1 Temporally Weighted Linear Prediction Features for Tackling Additive Noise in Speaker Verification
	Rahim Saeidi; Student Member; Jouni Pohjalainen...
	In: http://cs.joensuu.fi/pages/tkinnu/webpage/pdf/swlpspl.pdf
	BASE
	Show details

8	Parameterization of the Glottal Closing Phase Characteristics in Different Phonation Types
	Laura Lehto; Matti Airas; Eva Björkner...
	In: http://lib.tkk.fi/Diss/2007/isbn9789512286980/article6.pdf
	BASE
	Show details

9	Towards Glottal Source Controllability in Expressive Speech Synthesis
	Jaime Lorenzo-trueba; Roberto Barra-chicote; Tuomo Raitio...
	In: http://www-gth.die.upm.es/research/documentation/AG-112Tow-12.pdf
	BASE
	Show details

10	SPEAKER IDENTIFICATION FROM SHOUTED SPEECH: ANALYSIS AND COMPENSATION
	Cemal Hanilçi; Tomi Kinnunen; Rahim Saeidi...
	In: http://cs.joensuu.fi/pages/tkinnu/webpage/pdf/shouted_speaker_ID.pdf
	BASE
	Show details

11	ON SEPARATING GLOTTAL SOURCE AND VOCAL TRACT INFORMATION IN TELEPHONY SPEAKER VERIFICATION
	Tomi Kinnunen; Paavo Alku
	In: http://cs.joensuu.fi/pages/tkinnu/webpage/pdf/iaif-speaker-recognition.pdf
	BASE
	Show details

12	RESEARCH ARTICLE Open Access
	Eira Jansson-verkasalo; Timo Ruusuvirta; Minna Huotilainen...
	In: ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/a6/6f/BMC_Neurosci_2010_Jul_30_11_88.tar.gz
	BASE
	Show details

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern