Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Page: 1 2

Hits 1 – 20 of 33

1	Articulatory representations to address acoustic variability in speech
	Sivaraman, Ganesh. - 2017
	Abstract: The past decade has seen phenomenal improvement in the performance of Automatic Speech Recognition (ASR) systems. In spite of this vast improvement in performance, the state-of-the-art still lags significantly behind human speech recognition. Even though certain systems claim super-human performance, this performance often is sub-par across domains and across datasets. This gap is predominantly due to the lack of robustness against speech variability. Even clean speech is extremely variable due to a large number of factors such as voice characteristics, speaking style, speaking rate, accents, casualness, emotions and more. The goal of this thesis is to investigate the variability of speech from the perspective of speech production, put forth robust articulatory features to address this variability, and to incorporate these features in state-of-the-art ASR systems in the best way possible. ASR systems model speech as a sequence of distinctive phone units like beads on a string. Although phonemes are distinctive units in the cognitive domain, their physical realizations are extremely varied due to coarticulation and lenition which are commonly observed in conversational speech. The traditional approaches deal with this issue by performing di-, tri- or quin-phone based acoustic modeling but are insufficient to model longer contextual dependencies. Articulatory phonology analyzes speech as a constellation of coordinated articulatory gestures performed by the articulators in the vocal tract (lips, tongue tip, tongue body, jaw, glottis and velum). In this framework, acoustic variability is explained by the temporal overlap of gestures and their reduction in space. In order to analyze speech in terms of articulatory gestures, the gestures need to be estimated from the speech signal. The first part of the thesis focuses on a speaker independent acoustic-to-articulatory inversion system that was developed to estimate vocal tract constriction variables (TVs) from speech. The mapping from acoustics to TVs was learned from the multi-speaker X-ray Microbeam (XRMB) articulatory dataset. Constriction regions from TV trajectories were defined as articulatory gestures using articulatory kinematics. The speech inversion system combined with the TV kinematics based gesture annotation provided a system to estimate articulatory gestures from speech. The second part of this thesis deals with the analysis of the articulatory trajectories under different types of variability such as multiple speakers, speaking rate, and accents. It was observed that speaker variation degraded the performance of the speech inversion system. A Vocal Tract Length Normalization (VTLN) based speaker normalization technique was therefore developed to address the speaker variability in the acoustic and articulatory domains. The performance of speech inversion systems was analyzed on an articulatory dataset containing speaking rate variations to assess if the model was able to reliably predict the TVs in challenging coarticulatory scenarios. The performance of the speech inversion system was analyzed in cross accent and cross language scenarios through experiments on a Dutch and British English articulatory dataset. These experiments provide a quantitative measure of the robustness of the speech inversion systems to different speech variability. The final part of this thesis deals with the incorporation of articulatory features in state-of-the-art medium vocabulary ASR systems. A hybrid convolutional neural network (CNN) architecture was developed to fuse the acoustic and articulatory feature streams in an ASR system. ASR experiments were performed on the Wall Street Journal (WSJ) corpus. Several articulatory feature combinations were explored to determine the best feature combination. Cross-corpus evaluations were carried out to evaluate the WSJ trained ASR system on the TIMIT and another dataset containing speaking rate variability. Results showed that combining articulatory features with acoustic features through the hybrid CNN improved the performance of the ASR system in matched and mismatched evaluation conditions. The findings based on this dissertation indicate that articulatory representations extracted from acoustics can be used to address acoustic variability in speech observed due to speakers, accents, and speaking rates and further be used to improve the performance of Automatic Speech Recognition systems.
	Keyword: Articulatory features; Articulatory phonology; Automatic Speech Recognition; Electrical engineering; Linguistics; Speaker adaptation; Speech inversion; Speech variability
	URL: https://doi.org/10.13016/M2BK16R29 http://hdl.handle.net/1903/20422
	BASE
	Hide details

2	Improved vocal tract reconstruction and modeling using an image super-resolution technique
	Zhou, Xinhui; Woo, Jonghye; Stone, Maureen. - : Acoustical Society of America, 2013
	BASE
	Show details

3	Articulatory information for noise robust speech recognition
	Nam, Hosung; Espy-Wilson, Carol Y.; Saltzman, Elliot...
	In: Institute of Electrical and Electronics Engineers. IEEE transactions on audio, speech and language processing. - New York, NY : Inst. 19 (2011) 7, 1913-1924
	BLLDB
	OLC Linguistik
	Show details

4	Retrieving Tract Variables From Acoustics: A Comparison of Different Machine Learning Strategies
	Mitra, Vikramjit; Nam, Hosung; Espy-Wilson, Carol Y.. - 2010
	BASE
	Show details

5	ARTICULATORY INFORMATION FOR ROBUST SPEECH RECOGNITION
	Mitra, Vikramjit. - 2010
	BASE
	Show details

6	An MRI-based articulatory and acoustic study of American English liquid sounds /r/ and /l/
	Zhou, Xinhui. - 2009
	BASE
	Show details

7	A magnetic resonance imaging-based articulatory and acoustic study of “retroflex” and “bunched” American English ∕r∕
	Zhou, Xinhui; Espy-Wilson, Carol Y.; Boyce, Suzanne. - : Acoustical Society of America, 2008
	BASE
	Show details

8	Synergy of Acoustic-Phonetics and Auditory Modeling Towards Robust Speech Recognition
	Deshmukh, Om Dadaji. - 2006
	BASE
	Show details

9	Robust Voice Mining Techniques for Telephone Conversations
	Manocha, Sandeep. - 2006
	BASE
	Show details

10	Use of temporal information : detection of periodicity, aperiodicity, and pitch in speech
	Espy-Wilson, Carol Y.; Salomon, Ariel; Singh, Jawahar...
	In: Institute of Electrical and Electronics Engineers. IEEE transactions on speech and audio processing. - New York, NY : Inst. 13 (2005) 5, 776-786
	BLLDB
	OLC Linguistik
	Show details

11	Acoustic parameters for automatic detection of nasal manner
	Pruthi, Tarun; Espy-Wilson, Carol Y.
	In: Speech communication. - Amsterdam [u.a.] : Elsevier 43 (2004) 3, 225-240
	OLC Linguistik
	Show details

12	Acoustic parameters for automatic detection of nasal manner
	Pruthi, Tarun; Espy-Wilson, Carol Y.
	In: Speech communication. - Amsterdam [u.a.] : Elsevier 43 (2004) 3, 225-239
	BLLDB
	Show details

13	Acoustic modeling of American English /r/
	Espy-Wilson, Carol Y.; Boyce, Suzanne E.; Jackson, Michel...
	In: Acoustical Society of America. The journal of the Acoustical Society of America. - Melville, NY : AIP 108 (2000) 1, 343-356
	BLLDB
	Show details

14	Articulatory tradeoffs reduce acoustic variability during American English /r/ production
	Guenther, Frank H.; Espy-Wilson, Carol Y.; Boyce, Suzanne E....
	In: Acoustical Society of America. The journal of the Acoustical Society of America. - Melville, NY : AIP 105 (1999) 5, 2854-2865
	BLLDB
	Show details

15	Speech - Articles and Reports - Enhancement of Electrolaryngeal Speech by Adaptive Filtering
	Espy-Wilson, Carol Y.; Chari, Venkatesh R.; MacAuslan, Joel M....
	In: Journal of speech, language, and hearing research. - Rockville, Md. : American Speech-Language-Hearing Association 41 (1998) 6, 1253-1264
	OLC Linguistik
	Show details

16	Enhancement of electrolaryngeal speech by adaptive filtering
	Espy-Wilson, Carol Y.; Chari, Venkatesh R.; MacAuslan, Joel M....
	In: Journal of speech, language, and hearing research. - Rockville, Md. : American Speech-Language-Hearing Association 41 (1998) 6, 1253-1264
	BLLDB
	Show details

17	Intraspeaker Comparisons of Acoustic and Articulatory Variability in American English /r/ Productions
	Matthies, Melanie L.; Perkell, Joseph S.; Boyce, Suzanne E.. - : Boston University Center for Adaptive Systems and Department of Cognitive and Neural Systems, 1997
	BASE
	Show details

18	Coarticulatory stability in American English /r/
	Boyce, Suzanne E.; Espy-Wilson, Carol Y.
	In: Acoustical Society of America. The journal of the Acoustical Society of America. - Melville, NY : AIP 101 (1997) 6, 3741-3753
	BLLDB
	Show details

19	Speech Communication
	Boyce, Suzanne E.; Vick, Jennell C.; Chuang, Erika S.. - : Research Laboratory of Electronics (RLE) at the Massachusetts Institute of Technology (MIT), 1997
	BASE
	Show details

20	Coarticulatory stability in American English /r/
	Boyce, Suzanne E.; Espy-Wilson, Carol Y.
	In: ICSLP <4, 1996, Philadelphia, Pa.>. Proceedings ; 3. - Wilmington, Del. : Applied Science and Engineering Laboratories (1996), 1577-1580
	BLLDB
	Show details

Page: 1 2

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern