Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Page: 1 2

Hits 1 – 20 of 30

1	Towards a Perceptual Model for Estimating the Quality of Visual Speech ...
	Aldeneh, Zakaria; Fedzechkina, Masha; Seto, Skyler. - : arXiv, 2022
	BASE
	Show details

2	An error correction scheme for improved air-tissue boundary in real-time MRI video for speech production ...
	Roy, Anwesha; Belagali, Varun; Ghosh, Prasanta Kumar. - : arXiv, 2022
	BASE
	Show details

3	Expression-preserving face frontalization improves visually assisted speech processing ...
	Kang, Zhiqi; Sadeghi, Mostafa; Horaud, Radu; Alameda-Pineda, Xavier. - : arXiv, 2022
	Abstract: Face frontalization consists of synthesizing a frontally-viewed face from an arbitrarily-viewed one. The main contribution of this paper is a frontalization methodology that preserves non-rigid facial deformations in order to boost the performance of visually assisted speech communication. The method alternates between the estimation of (i)~the rigid transformation (scale, rotation, and translation) and (ii)~the non-rigid deformation between an arbitrarily-viewed face and a face model. The method has two important merits: it can deal with non-Gaussian errors in the data and it incorporates a dynamical face deformation model. For that purpose, we use the generalized Student t-distribution in combination with a linear dynamic system in order to account for both rigid head motions and time-varying facial deformations caused by speech production. We propose to use the zero-mean normalized cross-correlation (ZNCC) score to evaluate the ability of the method to preserve facial expressions. The method is thoroughly ...
	Keyword: Audio and Speech Processing eess.AS; Computer Vision and Pattern Recognition cs.CV; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering; Sound cs.SD
	URL: https://dx.doi.org/10.48550/arxiv.2204.02810 https://arxiv.org/abs/2204.02810
	BASE
	Hide details

4	Freeform Body Motion Generation from Speech ...
	Xu, Jing; Zhang, Wei; Bai, Yalong. - : arXiv, 2022
	BASE
	Show details

5	Self-Supervised Representation Learning for Speech Using Visual Grounding and Masked Language Modeling ...
	Peng, Puyuan; Harwath, David. - : arXiv, 2022
	BASE
	Show details

6	Speaker Extraction with Co-Speech Gestures Cue ...
	Pan, Zexu; Qian, Xinyuan; Li, Haizhou. - : arXiv, 2022
	BASE
	Show details

7	Facetron: Multi-speaker Face-to-Speech Model based on Cross-modal Latent Representations ...
	Um, Se-Yun; Kim, Jihyun; Lee, Jihyun. - : arXiv, 2021
	BASE
	Show details

8	Silent Speech and Emotion Recognition from Vocal Tract Shape Dynamics in Real-Time MRI ...
	Pandey, Laxmi; Arif, Ahmed Sabbir. - : arXiv, 2021
	BASE
	Show details

9	Improving Ultrasound Tongue Image Reconstruction from Lip Images Using Self-supervised Learning and Attention Mechanism ...
	Liu, Haiyang; Zhang, Jihan. - : arXiv, 2021
	BASE
	Show details

10	Cascaded Multilingual Audio-Visual Learning from Videos ...
	Rouditchenko, Andrew; Boggust, Angie; Harwath, David. - : arXiv, 2021
	BASE
	Show details

11	Speaker embeddings by modeling channel-wise correlations ...
	Stafylakis, Themos; Rohdin, Johan; Burget, Lukas. - : arXiv, 2021
	BASE
	Show details

12	Can phones, syllables, and words emerge as side-products of cross-situational audiovisual learning? -- A computational investigation ...
	Khorrami, Khazar; Räsänen, Okko. - : arXiv, 2021
	BASE
	Show details

13	AudioViewer: Learning to Visualize Sounds ...
	Zhang, Yuchi; Song, Chunjin; Peng, Willis. - : arXiv, 2020
	BASE
	Show details

14	Large-scale multilingual audio visual dubbing ...
	Yang, Yi; Shillingford, Brendan; Assael, Yannis. - : arXiv, 2020
	BASE
	Show details

15	Cross-modal Speaker Verification and Recognition: A Multilingual Perspective ...
	Saeed, Muhammad Saad; Nawaz, Shah; Morerio, Pietro. - : arXiv, 2020
	BASE
	Show details

16	Designing, Playing, and Performing with a Vision-based Mouth Interface ...
	Lyons, Michael J.; Haehnel, Michael; Tetsutani, Nobuji. - : arXiv, 2020
	BASE
	Show details

17	SLNSpeech: solving extended speech separation problem by the help of sign language ...
	Wu, Jiasong; Li, Taotao; Kong, Youyong. - : arXiv, 2020
	BASE
	Show details

18	Unsupervised Audiovisual Synthesis via Exemplar Autoencoders ...
	Deng, Kangle; Bansal, Aayush; Ramanan, Deva. - : arXiv, 2020
	BASE
	Show details

19	UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation ...
	Luo, Huaishao; Ji, Lei; Shi, Botian. - : arXiv, 2020
	BASE
	Show details

20	Disentangled Speech Embeddings using Cross-modal Self-supervision ...
	Nagrani, Arsha; Chung, Joon Son; Albanie, Samuel. - : arXiv, 2020
	BASE
	Show details

Page: 1 2

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern