DE eng

Search in the Catalogues and Directories

Page: 1 2 3 4
Hits 1 – 20 of 68

1
Influence of Highly Inflected Word Forms and Acoustic Background on the Robustness of Automatic Speech Recognition for Human–Computer Interaction
In: Mathematics; Volume 10; Issue 5; Pages: 711 (2022)
BASE
Show details
2
Discriminative feature modeling for statistical speech recognition ...
Tüske, Zoltán. - : RWTH Aachen University, 2021
BASE
Show details
3
Cross-lingual acoustic modeling in upper sorbian - preliminary study
In: Fraunhofer IKTS (2021)
BASE
Show details
4
Glottal Stops in Upper Sorbian: A Data-Driven Approach
In: Fraunhofer IKTS (2021)
BASE
Show details
5
Estimating the Degree of Sleepiness by Integrating Articulatory Feature Knowledge in Raw Waveform Based CNNS ...
BASE
Show details
6
Estimating the Degree of Sleepiness by Integrating Articulatory Feature Knowledge in Raw Waveform Based CNNS ...
BASE
Show details
7
Dealing with linguistic mismatches for automatic speech recognition
Yang, Xuesong. - 2019
BASE
Show details
8
Speech recognition with probabilistic transcriptions and end-to-end systems using deep learning
Das, Amit. - 2018
Abstract: In this thesis, we develop deep learning models in automatic speech recognition (ASR) for two contrasting tasks characterized by the amounts of labeled data available for training. In the first half, we deal with scenarios when there are limited or no labeled data for training ASR systems. This situation is commonly prevalent in languages which are under-resourced. However, in the second half, we train ASR systems with large amounts of labeled data in English. Our objective is to improve modern end-to-end (E2E) ASR using attention modeling. Thus, the two primary contributions of this thesis are the following: Cross-Lingual Speech Recognition in Under-Resourced Scenarios: A well-resourced language is a language with an abundance of resources to support the development of speech technology. Those resources are usually defined in terms of 100+ hours of speech data, corresponding transcriptions, pronunciation dictionaries, and language models. In contrast, an under-resourced language lacks one or more of these resources. The most expensive and time-consuming resource is the acquisition of transcriptions due to the difficulty in finding native transcribers. The first part of the thesis proposes methods by which deep neural networks (DNNs) can be trained when there are limited or no transcribed data in the target language. Such scenarios are common for languages which are under-resourced. Two key components of this proposition are Transfer Learning and Crowdsourcing. Through these methods, we demonstrate that it is possible to borrow statistical knowledge of acoustics from a variety of other well-resourced languages to learn the parameters of a the DNN in the target under-resourced language. In particular, we use well-resourced languages as cross-entropy regularizers to improve the generalization capacity of the target language. A key accomplishment of this study is that it is the first to train DNNs using noisy labels in the target language transcribed by non-native speakers available in online marketplaces. End-to-End Large Vocabulary Automatic Speech Recognition: Recent advances in ASR have been mostly due to the advent of deep learning models. Such models have the ability to discover complex non-linear relationships between attributes that are usually found in real-world tasks. Despite these advances, building a conventional ASR system is a cumbersome procedure since it involves optimizing several components separately in a disjoint fashion. To alleviate this problem, modern ASR systems have adopted a new approach of directly transducing speech signals to text. Such systems are known as E2E systems and one such system is the Connectionist Temporal Classification (CTC). However, one drawback of CTC is the hard alignment problem as it relies only on the current input to generate the current output. In reality, the output at the current time is influenced not only by the current input but also by inputs in the past and the future. Thus, the second part of the thesis proposes advancing state-of-the-art E2E speech recognition for large corpora by directly incorporating attention modeling within the CTC framework. In attention modeling, inputs in the current, past, and future are distinctively weighted depending on the degree of influence they exert on the current output. We accomplish this by deriving new context vectors using time convolution features to model attention as part of the CTC network. To further improve attention modeling, we extract more reliable content information from a network representing an implicit language model. Finally, we used vector based attention weights that are applied on context vectors across both time and their individual components. A key accomplishment of this study is that it is the first to incorporate attention directly within the CTC network. Furthermore, we show that our proposed attention-based CTC model, even in the absence of an explicit language model, is able to achieve lower word error rates than a well-trained conventional ASR system equipped with a strong external language model.
Keyword: acoustic modeling; attention; cross-lingual; crowdsourcing; CTC; deep neural networks; end-to-end; recurrent neural networks; speech recognition; transfer learning; under-resourced
URL: http://hdl.handle.net/2142/102804
BASE
Hide details
9
Phonetic Context Embeddings for DNN-HMM Phone Recognition
In: Interspeech 2016 ; https://hal.sorbonne-universite.fr/hal-02166078 ; Interspeech 2016, Sep 2016, SAN FRANCISCO, United States. pp.405-409, ⟨10.21437/Interspeech.2016-1036⟩ (2016)
BASE
Show details
10
Robust automatic speech recognition for children ...
Gurunath Shivakumar, Prashanth. - : University of Southern California Digital Library (USC.DL), 2015
BASE
Show details
11
Modeling of a rise-fall intonation pattern in the language of young Paris Speakers
In: Speech Prosody ; https://halshs.archives-ouvertes.fr/halshs-01069584 ; Speech Prosody, 2014, 7, pp.814-818 (2014)
BASE
Show details
12
Vers une modélisation acoustique de l'intonation des jeunes en région parisienne : une question de " proximité " ?
In: ISSN: 1661-8246 ; EISSN: 1661-8246 ; Nouveaux Cahiers de Linguistique Française ; https://halshs.archives-ouvertes.fr/halshs-01069593 ; Nouveaux Cahiers de Linguistique Française, Université de Genève, 2014, 31, pp.257-171 (2014)
BASE
Show details
13
Towards the automatic processing of Yongning Na (Sino-Tibetan): developing a 'light' acoustic model of the target language and testing 'heavyweight' models from five national languages
In: Proceedings of the 4th International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU 2014) ; 4th International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU 2014) ; https://halshs.archives-ouvertes.fr/halshs-00980431 ; 4th International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU 2014), May 2014, St Petersburg, Russia. pp.153-160 (2014)
BASE
Show details
14
Modélisation acoustico-phonétique de langues peu dotées : Études phonétiques et travaux de reconnaissance automatique en luxembourgois
In: Journées d'Etude sur la Parole ; https://hal.archives-ouvertes.fr/hal-01843399 ; Journées d'Etude sur la Parole, Jan 2014, Le Mans, France (2014)
BASE
Show details
15
Speech Alignment and Recognition Experiments for Luxembourgish
In: Proceedings of the 4th International Workshop on Spoken Language Technologies for Underresourced Languages ; 4th International Workshop on Spoken Language Technologies for Underresourced Languages ; https://hal.archives-ouvertes.fr/hal-01134824 ; 4th International Workshop on Spoken Language Technologies for Underresourced Languages, May 2014, Saint-Petersbourg, Russia. pp.53-60 ; http://www.mica.edu.vn/sltu2014/ (2014)
BASE
Show details
16
A First LVCSR System for Luxembourgish, a Low-Resourced European Language
In: Human Language Technology Challenges for Computer Science and Linguistics ; https://hal.archives-ouvertes.fr/hal-01135103 ; Zygmunt Vetulani; Joseph Mariani. Human Language Technology Challenges for Computer Science and Linguistics, 8387, Springer International Publishing, pp.479-490, 2014, 5th Language and Technology Conference, LTC 2011, Poznań, Poland, November 25--27, 2011, Revised Selected Papers, 978-3-319-08957-7. ⟨10.1007/978-3-319-08958-4_39⟩ (2014)
BASE
Show details
17
Impact of Video Modeling Techniques on Efficiency and Effectiveness of Clinical Voice Assessment
In: http://rave.ohiolink.edu/etdc/view?acc_num=miami1398686540 (2014)
BASE
Show details
18
Anger Recognition in Speech Using Acoustic and Linguistic Cues
: Elsevier, 2013
BASE
Show details
19
Detection of acoustic-phonetic landmarks in mismatched conditions using a biomimetic model of human auditory processing
In: http://www.isle.uiuc.edu/%7Esborys/king_coling12.pdf (2012)
BASE
Show details
20
Detection of acoustic-phonetic landmarks in mismatched conditions using a biomimetic model of human auditory processing
In: http://aclweb.org/anthology/C/C12/C12-2058.pdf (2012)
BASE
Show details

Page: 1 2 3 4

Catalogues
Bibliographies
Linked Open Data catalogues
Online resources
Open access documents
68
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern