DE eng

Search in the Catalogues and Directories

Page: 1 2
Hits 1 – 20 of 33

1
Articulatory representations to address acoustic variability in speech
Abstract: The past decade has seen phenomenal improvement in the performance of Automatic Speech Recognition (ASR) systems. In spite of this vast improvement in performance, the state-of-the-art still lags significantly behind human speech recognition. Even though certain systems claim super-human performance, this performance often is sub-par across domains and across datasets. This gap is predominantly due to the lack of robustness against speech variability. Even clean speech is extremely variable due to a large number of factors such as voice characteristics, speaking style, speaking rate, accents, casualness, emotions and more. The goal of this thesis is to investigate the variability of speech from the perspective of speech production, put forth robust articulatory features to address this variability, and to incorporate these features in state-of-the-art ASR systems in the best way possible. ASR systems model speech as a sequence of distinctive phone units like beads on a string. Although phonemes are distinctive units in the cognitive domain, their physical realizations are extremely varied due to coarticulation and lenition which are commonly observed in conversational speech. The traditional approaches deal with this issue by performing di-, tri- or quin-phone based acoustic modeling but are insufficient to model longer contextual dependencies. Articulatory phonology analyzes speech as a constellation of coordinated articulatory gestures performed by the articulators in the vocal tract (lips, tongue tip, tongue body, jaw, glottis and velum). In this framework, acoustic variability is explained by the temporal overlap of gestures and their reduction in space. In order to analyze speech in terms of articulatory gestures, the gestures need to be estimated from the speech signal. The first part of the thesis focuses on a speaker independent acoustic-to-articulatory inversion system that was developed to estimate vocal tract constriction variables (TVs) from speech. The mapping from acoustics to TVs was learned from the multi-speaker X-ray Microbeam (XRMB) articulatory dataset. Constriction regions from TV trajectories were defined as articulatory gestures using articulatory kinematics. The speech inversion system combined with the TV kinematics based gesture annotation provided a system to estimate articulatory gestures from speech. The second part of this thesis deals with the analysis of the articulatory trajectories under different types of variability such as multiple speakers, speaking rate, and accents. It was observed that speaker variation degraded the performance of the speech inversion system. A Vocal Tract Length Normalization (VTLN) based speaker normalization technique was therefore developed to address the speaker variability in the acoustic and articulatory domains. The performance of speech inversion systems was analyzed on an articulatory dataset containing speaking rate variations to assess if the model was able to reliably predict the TVs in challenging coarticulatory scenarios. The performance of the speech inversion system was analyzed in cross accent and cross language scenarios through experiments on a Dutch and British English articulatory dataset. These experiments provide a quantitative measure of the robustness of the speech inversion systems to different speech variability. The final part of this thesis deals with the incorporation of articulatory features in state-of-the-art medium vocabulary ASR systems. A hybrid convolutional neural network (CNN) architecture was developed to fuse the acoustic and articulatory feature streams in an ASR system. ASR experiments were performed on the Wall Street Journal (WSJ) corpus. Several articulatory feature combinations were explored to determine the best feature combination. Cross-corpus evaluations were carried out to evaluate the WSJ trained ASR system on the TIMIT and another dataset containing speaking rate variability. Results showed that combining articulatory features with acoustic features through the hybrid CNN improved the performance of the ASR system in matched and mismatched evaluation conditions. The findings based on this dissertation indicate that articulatory representations extracted from acoustics can be used to address acoustic variability in speech observed due to speakers, accents, and speaking rates and further be used to improve the performance of Automatic Speech Recognition systems.
Keyword: Articulatory features; Articulatory phonology; Automatic Speech Recognition; Electrical engineering; Linguistics; Speaker adaptation; Speech inversion; Speech variability
URL: https://doi.org/10.13016/M2BK16R29
http://hdl.handle.net/1903/20422
BASE
Hide details
2
Improved vocal tract reconstruction and modeling using an image super-resolution technique
Zhou, Xinhui; Woo, Jonghye; Stone, Maureen. - : Acoustical Society of America, 2013
BASE
Show details
3
Articulatory information for noise robust speech recognition
In: Institute of Electrical and Electronics Engineers. IEEE transactions on audio, speech and language processing. - New York, NY : Inst. 19 (2011) 7, 1913-1924
BLLDB
OLC Linguistik
Show details
4
Retrieving Tract Variables From Acoustics: A Comparison of Different Machine Learning Strategies
BASE
Show details
5
ARTICULATORY INFORMATION FOR ROBUST SPEECH RECOGNITION
BASE
Show details
6
An MRI-based articulatory and acoustic study of American English liquid sounds /r/ and /l/
Zhou, Xinhui. - 2009
BASE
Show details
7
A magnetic resonance imaging-based articulatory and acoustic study of “retroflex” and “bunched” American English ∕r∕
Zhou, Xinhui; Espy-Wilson, Carol Y.; Boyce, Suzanne. - : Acoustical Society of America, 2008
BASE
Show details
8
Synergy of Acoustic-Phonetics and Auditory Modeling Towards Robust Speech Recognition
BASE
Show details
9
Robust Voice Mining Techniques for Telephone Conversations
BASE
Show details
10
Use of temporal information : detection of periodicity, aperiodicity, and pitch in speech
In: Institute of Electrical and Electronics Engineers. IEEE transactions on speech and audio processing. - New York, NY : Inst. 13 (2005) 5, 776-786
BLLDB
OLC Linguistik
Show details
11
Acoustic parameters for automatic detection of nasal manner
In: Speech communication. - Amsterdam [u.a.] : Elsevier 43 (2004) 3, 225-240
OLC Linguistik
Show details
12
Acoustic parameters for automatic detection of nasal manner
In: Speech communication. - Amsterdam [u.a.] : Elsevier 43 (2004) 3, 225-239
BLLDB
Show details
13
Acoustic modeling of American English /r/
In: Acoustical Society of America. The journal of the Acoustical Society of America. - Melville, NY : AIP 108 (2000) 1, 343-356
BLLDB
Show details
14
Articulatory tradeoffs reduce acoustic variability during American English /r/ production
In: Acoustical Society of America. The journal of the Acoustical Society of America. - Melville, NY : AIP 105 (1999) 5, 2854-2865
BLLDB
Show details
15
Speech - Articles and Reports - Enhancement of Electrolaryngeal Speech by Adaptive Filtering
In: Journal of speech, language, and hearing research. - Rockville, Md. : American Speech-Language-Hearing Association 41 (1998) 6, 1253-1264
OLC Linguistik
Show details
16
Enhancement of electrolaryngeal speech by adaptive filtering
In: Journal of speech, language, and hearing research. - Rockville, Md. : American Speech-Language-Hearing Association 41 (1998) 6, 1253-1264
BLLDB
Show details
17
Intraspeaker Comparisons of Acoustic and Articulatory Variability in American English /r/ Productions
Matthies, Melanie L.; Perkell, Joseph S.; Boyce, Suzanne E.. - : Boston University Center for Adaptive Systems and Department of Cognitive and Neural Systems, 1997
BASE
Show details
18
Coarticulatory stability in American English /r/
In: Acoustical Society of America. The journal of the Acoustical Society of America. - Melville, NY : AIP 101 (1997) 6, 3741-3753
BLLDB
Show details
19
Speech Communication
Boyce, Suzanne E.; Vick, Jennell C.; Chuang, Erika S.. - : Research Laboratory of Electronics (RLE) at the Massachusetts Institute of Technology (MIT), 1997
BASE
Show details
20
Coarticulatory stability in American English /r/
In: ICSLP <4, 1996, Philadelphia, Pa.>. Proceedings ; 3. - Wilmington, Del. : Applied Science and Engineering Laboratories (1996), 1577-1580
BLLDB
Show details

Page: 1 2

Catalogues
0
0
4
0
0
0
0
Bibliographies
11
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
20
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern