DE eng

Search in the Catalogues and Directories

Hits 1 – 9 of 9

1
The Impact of Removing Head Movements on Audio-visual Speech Enhancement
In: ICASSP 2022 - IEEE International Conference on Acoustics, Speech and Signal Processing ; https://hal.inria.fr/hal-03551610 ; ICASSP 2022 - IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE Signal Processing Society, May 2022, Singapore, Singapore. pp.1-5 (2022)
BASE
Show details
2
Expression-preserving face frontalization improves visually assisted speech processing ...
BASE
Show details
3
Robust Face Frontalization For Visual Speech Recognition
In: International Conference on Computer Vision Workshops ; https://hal.inria.fr/hal-03326002 ; International Conference on Computer Vision Workshops, IEEE, Oct 2021, Montreal - Virtual, Canada ; http://iccv2021.thecvf.com/home (2021)
BASE
Show details
4
Robust Face Frontalization For Visual Speech Recognition
In: ICCV 2021 - International Conference on Computer Vision Workshops ; https://hal.inria.fr/hal-03326002 ; ICCV 2021 - International Conference on Computer Vision Workshops, IEEE, Oct 2021, Montreal - Virtual, Canada. pp.1-16 ; http://iccv2021.thecvf.com/home (2021)
BASE
Show details
5
Robust Face Frontalization For Visual Speech Recognition
In: ICCV 2021 - International Conference on Computer Vision Workshops ; https://hal.inria.fr/hal-03326002 ; ICCV 2021 - International Conference on Computer Vision Workshops, IEEE, Oct 2021, Montreal - Virtual, Canada. pp.1-16 ; http://iccv2021.thecvf.com/home (2021)
BASE
Show details
6
Narrow-band Deep Filtering for Multichannel Speech Enhancement
In: https://hal.inria.fr/hal-02378413 ; 2020 (2020)
Abstract: In this paper, we address the problem of multichannel speech enhancement in the short-time Fourier transform (STFT) domain. A long short-time memory (LSTM) network takes as input a sequence of STFT coefficients associated with a frequency bin of multichannel noisy-speech signals. The network's output is the corresponding sequence of single-channel cleaned speech. We propose several clean-speech network targets, namely, the magnitude ratio mask, the complex STFT coefficients and the (smoothed) spatial filter. A prominent feature of the proposed model is that the same LSTM architecture, with identical parameters, is trained across frequency bins. The proposed method is referred to as narrow-band deep filtering. This choice stays in contrast with traditional wideband speech enhancement methods. The proposed deep filtering is able to discriminate between speech and noise by exploiting their different temporal and spatial characteristics: speech is non-stationary and spatially coherent while noise is relatively stationary and weakly correlated across channels. This is similar in spirit with unsupervised techniques, such as spectral subtraction and beamforming. We describe extensive experiments with both mixed signals (noise is added to clean speech) and real signals (live recordings). We empirically evaluate the proposed architecture variants using speech enhancement and speech recognition metrics, and we compare our results with the results obtained with several state of the art methods. In the light of these experiments we conclude that narrow-band deep filtering has very good speech enhancement and speech recognition performance, and excellent generalization capabilities in terms of speaker variability and noise type.
Keyword: [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG]; [INFO.INFO-SD]Computer Science [cs]/Sound [cs.SD]; [INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing; deep fitlering; LSTM; recurrent neural networks; Speech denoising; Speech enhancement
URL: https://hal.inria.fr/hal-02378413
https://hal.inria.fr/hal-02378413v2/file/multichannel_lstm.pdf
https://hal.inria.fr/hal-02378413v2/document
BASE
Hide details
7
Face Frontalization Based on Robustly Fitting a Deformable Shape Model to 3D Landmarks
In: https://hal.archives-ouvertes.fr/hal-02980346 ; 2020 (2020)
BASE
Show details
8
Voice Activity Detection Based on Statistical Likelihood Ratio With Adaptive Thresholding
In: IWAENC 2016 - International Workshop on Acoustic Signal Enhancement (IWAENC) ; https://hal.inria.fr/hal-01349776 ; IWAENC 2016 - International Workshop on Acoustic Signal Enhancement (IWAENC), Sep 2016, Xi'an, China. pp.1-5, ⟨10.1109/IWAENC.2016.7602911⟩ (2016)
BASE
Show details
9
RAVEL: an annotated corpus for training robots with audiovisual abilities [<Journal>]
Alameda-Pineda, Xavier [Verfasser]; Sanchez-Riera, Jordi [Verfasser]; Wienke, Johannes [Verfasser].
DNB Subject Category Language
Show details

Catalogues
0
0
0
0
1
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
8
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern