DE eng

Search in the Catalogues and Directories

Page: 1 2 3 4
Hits 1 – 20 of 80

1
Program Logic for Weak Memory Concurrency ...
Doko, Marko. - : Technische Universität Kaiserslautern, 2021
BASE
Show details
2
Neural Network Learning for Robust Speech Recognition
Qu, Leyuan. - : Staats- und Universitätsbibliothek Hamburg Carl von Ossietzky, 2021
Abstract: Recently, end-to-end architectures have dominated the modeling of Automatic Speech Recognition (ASR) systems. Conventional systems usually consist of independent components, like an acoustic model, a language model and a pronunciation model. In comparison, end-to-end ASR approaches aim to directly map acoustic inputs to character or word sequences, which significantly simplifies the complex training procedure. Plenty of end-to-end architectures have been proposed, for instance, Connectionist Temporal Classification (CTC), Sequence Transduction with Recurrent Neural Networks (RNN-T) and attention-based encoder-decoder, which have accomplished great success and achieved impressive performance on a variety of benchmarks or even reached human level on some tasks. However, although advanced deep neural network architectures have been proposed, in adverse environments, the performance of ASR systems suffers from significant degradation because of environmental noise or ambient reverberation. To improve the robustness of ASR systems, in this thesis, we address the research questions and conduct experiments from the following perspectives: Firstly, to learn more stable visual representations, we propose LipSound and LipSound2 and investigate to what extent the visual modality contains semantic information that can benefit ASR performance. The LipSound/LipSound2 model consists of an encoder-decoder with an location-aware attention architecture and directly transforms mouth or face movement sequences to low-level speech representations, i.e. mel-scale spectrograms. The model is trained in a crossmodal self-supervised fashion and does not require any human annotations since the model inputs (visual sequences) and outputs (audio signals) are naturally paired in videos. Experimental results show that the LipSound model not only generates quality mel-spectrograms but also outperforms state-of-the-art models on the GRID benchmark dataset in speaker-dependent settings. Moreover, the improved LipSound2 model further verifies the effectiveness on generalizability (speaker-independent) and transferability (Non-Chinese to Chinese) on large vocabulary continuous speech corpora. Secondly, to exploit the fact that the image of a face contains information about the person's speech sound, we incorporate face embeddings extracted from a pretrained model for face recognition into the target speech separation model, which guide the system for predicting a target speaker mask in the time-frequency domain. The experimental results show that a pre-enrolled face image is able to benefit separating expected speech signals. Additionally, face information is complementary to voice reference. Further improvement can be achieved when combining both face and voice embeddings. Thirdly, to integrate domain knowledge, i.e. articulatory features (AFs) into end-to-end learning, we present two approaches: (a) fine-tuning networks which reuse hidden layer representations of AF extractors as input for ASR tasks; (b) progressive networks which combine articulatory knowledge by lateral connections from AF extractors. Results show that progressive networks are more effective and accomplish a lower word error rate than fine-tuning networks and other baseline models. Finally, to enable end-to-end ASR models to acquire Out-of-Vocabulary (OOV) words, instead of just fine-tuning with the audio containing OOV words, we propose to rescale loss at sentence level or word level, which encourages models to pay more attention to unknown words. Experimental results reveal that fine-tuning the baseline ASR model with loss rescaling and L2/EWC (Elastic Weight Consolidation) regularization can significantly improve the recall rate of OOV words and efficiently overcome the model suffering catastrophic forgetting. Furthermore, loss rescaling at the word level is more stable than the sentence level method and results in less ASR performance loss on general non-OOV words and the LibriSpeech dataset. In sum, this thesis contributes to the robustness of ASR systems by leveraging additional visual sequences, face information and domain knowledge. We achieve significant improvement on speech reconstruction, speech separation, end-to-end modeling and OOV word recognition tasks.
Keyword: 004: Informatik; ddc:004:
URL: https://ediss.sub.uni-hamburg.de/handle/ediss/9437
http://nbn-resolving.de/urn:nbn:de:gbv:18-ediss-98286
BASE
Hide details
3
Student Performance and Collaboration in Introductory Courses to Theory of Computation ; Studierendenperformance und Kollaboration in Einführungskursen der Theoretischen Informatik
Frede, Christiane. - : Staats- und Universitätsbibliothek Hamburg Carl von Ossietzky, 2021
BASE
Show details
4
Classifying user information needs in cooking dialogues – an empirical performance evaluation of transformer networks
BASE
Show details
5
Entwicklung und Evaluation eines Tools zur lexikonbasierten Sentiment Analysis für die Digital Humanities
Dangel, Johanna. - 2021
BASE
Show details
6
Language representations for computational argumentation
Lauscher, Anne. - 2021
BASE
Show details
7
Accessible digital documentary heritage : guidelines for the preparation of documentary heritage in accessible formats for persons with disabilities ...
Darvishy, Alireza; Manning, Juliet. - : UNESCO, 2020
BASE
Show details
8
Erfahrung und Gewissheit – Orientierungen in den Wissenschaften und im Alltag. IV. Regensburger Symposium vom 24.-26. März 2011 ...
Thim-Mabrey, Christiane; Brack, Matthias. - : Universität Regensburg, 2020
BASE
Show details
9
Conversational Language Learning for Human-Robot Interaction
Bothe, Chandrakant Ramesh. - : Staats- und Universitätsbibliothek Hamburg Carl von Ossietzky, 2020
BASE
Show details
10
Natural Language Visual Grounding via Multimodal Learning ; Natürliche Sprache Visual Grounding durch multimodales Lernen
Mi, Jinpeng. - : Staats- und Universitätsbibliothek Hamburg Carl von Ossietzky, 2020
BASE
Show details
11
Erfahrung und Gewissheit – Orientierungen in den Wissenschaften und im Alltag. IV. Regensburger Symposium vom 24.-26. März 2011
Thim-Mabrey, Christiane; Brack, Matthias. - : Universitätsbibliothek Regensburg, 2020
BASE
Show details
12
ANNIS: A graph-based query system for deeply annotated text corpora ...
Krause, Thomas. - : Humboldt-Universität zu Berlin, 2019
BASE
Show details
13
Generating Formal Representations of System Specification from Natural Language Requirements
Irfan, Zeeshan. - 2019
BASE
Show details
14
ANNIS: A graph-based query system for deeply annotated text corpora
Krause, Thomas. - : Humboldt-Universität zu Berlin, 2019
BASE
Show details
15
Acquiring Architecture Knowledge for Technology Design Decisions ; Erfassung von Architekturwissen für Technologieentwurfsentscheidungen
Soliman, Mohamed Aboubakr Mohamed. - : Staats- und Universitätsbibliothek Hamburg Carl von Ossietzky, 2019
BASE
Show details
16
Adaptive Approaches to Natural Language Processing in Annotation and Application ; Adaptive Ansätze zur Verarbeitung natürlicher Sprache in Annotation und Anwendung
Yimam, Seid Muhie. - : Staats- und Universitätsbibliothek Hamburg Carl von Ossietzky, 2019
BASE
Show details
17
Predictive Dependency Parsing ; Vorhersagendes Dependenzparsing
Köhn, Arne. - : Staats- und Universitätsbibliothek Hamburg Carl von Ossietzky, 2019
BASE
Show details
18
Automatic generation of lexical recognition tests using natural language processing
BASE
Show details
19
Mining and Analyzing User Rationale in Software Engineering ; Gewinnung und Analyse von Nutzerbegründungen in der Softwaretechnik
Kurtanović, Zijad. - : Staats- und Universitätsbibliothek Hamburg Carl von Ossietzky, 2018
BASE
Show details
20
Operations on Graphs, Arrays and Automata ; Operationen auf Graphen, Arrays und Automaten
BASE
Show details

Page: 1 2 3 4

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
80
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern