DE eng

Search in the Catalogues and Directories

Hits 1 – 7 of 7

1
Grounding Hindsight Instructions in Multi-Goal Reinforcement Learning for Robotics ...
BASE
Show details
2
LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading ...
Abstract: The aim of this work is to investigate the impact of crossmodal self-supervised pre-training for speech reconstruction (video-to-audio) by leveraging the natural co-occurrence of audio and visual streams in videos. We propose LipSound2 which consists of an encoder-decoder architecture and location-aware attention mechanism to map face image sequences to mel-scale spectrograms directly without requiring any human annotations. The proposed LipSound2 model is firstly pre-trained on $\sim$2400h multi-lingual (e.g. English and German) audio-visual data (VoxCeleb2). To verify the generalizability of the proposed method, we then fine-tune the pre-trained model on domain-specific datasets (GRID, TCD-TIMIT) for English speech reconstruction and achieve a significant improvement on speech quality and intelligibility compared to previous approaches in speaker-dependent and -independent settings. In addition to English, we conduct Chinese speech reconstruction on the CMLR dataset to verify the impact on transferability. ... : SUBMITTED TO IEEE Transaction on Neural Networks and Learning Systems ...
Keyword: Artificial Intelligence cs.AI; Audio and Speech Processing eess.AS; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering; Sound cs.SD
URL: https://dx.doi.org/10.48550/arxiv.2112.04748
https://arxiv.org/abs/2112.04748
BASE
Hide details
3
Towards a self-organizing pre-symbolic neural model representing sensorimotor primitives ...
BASE
Show details
4
Incorporating End-to-End Speech Recognition Models for Sentiment Analysis ...
BASE
Show details
5
Towards Dialogue-based Navigation with Multivariate Adaptation driven by Intention and Politeness for Social Robots ...
BASE
Show details
6
GradAscent at EmoInt-2017: Character- and Word-Level Recurrent Neural Network Models for Tweet Emotion Intensity Detection ...
BASE
Show details
7
Interactive Natural Language Acquisition in a Multi-modal Recurrent Neural Architecture ...
Heinrich, Stefan; Wermter, Stefan. - : arXiv, 2017
BASE
Show details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
7
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern