Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Hits 1 – 7 of 7

1	Grounding Hindsight Instructions in Multi-Goal Reinforcement Learning for Robotics ...
	Röder, Frank; Eppe, Manfred; Wermter, Stefan. - : arXiv, 2022
	BASE
	Show details

2	LipSound2: Self-Supervised Pre-Training for Lip-to-Speech Reconstruction and Lip Reading ...
	Qu, Leyuan; Weber, Cornelius; Wermter, Stefan. - : arXiv, 2021
	Abstract: The aim of this work is to investigate the impact of crossmodal self-supervised pre-training for speech reconstruction (video-to-audio) by leveraging the natural co-occurrence of audio and visual streams in videos. We propose LipSound2 which consists of an encoder-decoder architecture and location-aware attention mechanism to map face image sequences to mel-scale spectrograms directly without requiring any human annotations. The proposed LipSound2 model is firstly pre-trained on $\sim$2400h multi-lingual (e.g. English and German) audio-visual data (VoxCeleb2). To verify the generalizability of the proposed method, we then fine-tune the pre-trained model on domain-specific datasets (GRID, TCD-TIMIT) for English speech reconstruction and achieve a significant improvement on speech quality and intelligibility compared to previous approaches in speaker-dependent and -independent settings. In addition to English, we conduct Chinese speech reconstruction on the CMLR dataset to verify the impact on transferability. ... : SUBMITTED TO IEEE Transaction on Neural Networks and Learning Systems ...
	Keyword: Artificial Intelligence cs.AI; Audio and Speech Processing eess.AS; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering; Sound cs.SD
	URL: https://dx.doi.org/10.48550/arxiv.2112.04748 https://arxiv.org/abs/2112.04748
	BASE
	Hide details

3	Towards a self-organizing pre-symbolic neural model representing sensorimotor primitives ...
	Zhong, Junpei; Cangelosi, Angelo; Wermter, Stefan. - : arXiv, 2020
	BASE
	Show details

4	Incorporating End-to-End Speech Recognition Models for Sentiment Analysis ...
	Lakomkin, Egor; Zamani, Mohammad Ali; Weber, Cornelius. - : arXiv, 2019
	BASE
	Show details

5	Towards Dialogue-based Navigation with Multivariate Adaptation driven by Intention and Politeness for Social Robots ...
	Bothe, Chandrakant; Garcia, Fernando; Maya, Arturo Cruz. - : arXiv, 2018
	BASE
	Show details

6	GradAscent at EmoInt-2017: Character- and Word-Level Recurrent Neural Network Models for Tweet Emotion Intensity Detection ...
	Lakomkin, Egor; Bothe, Chandrakant; Wermter, Stefan. - : arXiv, 2018
	BASE
	Show details

7	Interactive Natural Language Acquisition in a Multi-modal Recurrent Neural Architecture ...
	Heinrich, Stefan; Wermter, Stefan. - : arXiv, 2017
	BASE
	Show details

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern