1 |
The Impact of Removing Head Movements on Audio-visual Speech Enhancement
|
|
|
|
In: ICASSP 2022 - IEEE International Conference on Acoustics, Speech and Signal Processing ; https://hal.inria.fr/hal-03551610 ; ICASSP 2022 - IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE Signal Processing Society, May 2022, Singapore, Singapore. pp.1-5 (2022)
|
|
BASE
|
|
Show details
|
|
2 |
An Overview of Indian Spoken Language Recognition from Machine Learning Perspective
|
|
|
|
In: ISSN: 2375-4699 ; EISSN: 2375-4702 ; ACM Transactions on Asian and Low-Resource Language Information Processing ; https://hal.inria.fr/hal-03616853 ; ACM Transactions on Asian and Low-Resource Language Information Processing, ACM, In press, ⟨10.1145/3523179⟩ (2022)
|
|
BASE
|
|
Show details
|
|
3 |
BBC-Oxford British Sign Language Dataset
|
|
|
|
In: https://hal.archives-ouvertes.fr/hal-03516444 ; 2022 (2022)
|
|
BASE
|
|
Show details
|
|
4 |
Can machines learn to see without visual databases?
|
|
|
|
In: https://hal.archives-ouvertes.fr/hal-03526569 ; 2022 (2022)
|
|
BASE
|
|
Show details
|
|
5 |
Unsupervised quantification of entity consistency between photos and text in real-world news ...
|
|
Müller-Budack, Eric. - : Hannover : Institutionelles Repositorium der Leibniz Universität Hannover, 2022
|
|
BASE
|
|
Show details
|
|
6 |
Large-scale Bilingual Language-Image Contrastive Learning ...
|
|
|
|
Abstract:
This paper is a technical report to share our experience and findings building a Korean and English bilingual multimodal model. While many of the multimodal datasets focus on English and multilingual multimodal research uses machine-translated texts, employing such machine-translated texts is limited to describing unique expressions, cultural information, and proper noun in languages other than English. In this work, we collect 1.1 billion image-text pairs (708 million Korean and 476 million English) and train a bilingual multimodal model named KELIP. We introduce simple yet effective training schemes, including MAE pre-training and multi-crop augmentation. Extensive experiments demonstrate that a model trained with such training schemes shows competitive performance in both languages. Moreover, we discuss multimodal-related research questions: 1) strong augmentation-based methods can distract the model from learning proper multimodal relations; 2) training multimodal model without cross-lingual relation can ... : Accepted by ICLRW2022 ...
|
|
Keyword:
Computation and Language cs.CL; Computer Vision and Pattern Recognition cs.CV; FOS Computer and information sciences
|
|
URL: https://dx.doi.org/10.48550/arxiv.2203.14463 https://arxiv.org/abs/2203.14463
|
|
BASE
|
|
Hide details
|
|
7 |
Bridging Video-text Retrieval with Multiple Choice Questions ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Towards a Perceptual Model for Estimating the Quality of Visual Speech ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
An error correction scheme for improved air-tissue boundary in real-time MRI video for speech production ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Expression-preserving face frontalization improves visually assisted speech processing ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
WLASL-LEX: a Dataset for Recognising Phonological Properties in American Sign Language ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
Modeling Intensification for Sign Language Generation: A Computational Approach ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Keypoint based Sign Language Translation without Glosses ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
A Transformer-Based Contrastive Learning Approach for Few-Shot Sign Language Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
15 |
A Simple Multi-Modality Transfer Learning Baseline for Sign Language Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Including Facial Expressions in Contextual Embeddings for Sign Language Generation ...
|
|
|
|
BASE
|
|
Show details
|
|
17 |
Signing at Scale: Learning to Co-Articulate Signs for Large-Scale Photo-Realistic Sign Language Production ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Statistical and Spatio-temporal Hand Gesture Features for Sign Language Recognition using the Leap Motion Sensor ...
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Multi-View Spatial-Temporal Network for Continuous Sign Language Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
20 |
Gesture based Arabic Sign Language Recognition for Impaired People based on Convolution Neural Network ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|