1 |
Delving Deeper into Cross-lingual Visual Question Answering ...
|
|
|
|
Abstract:
Visual question answering (VQA) is one of the crucial vision-and-language tasks. Yet, the bulk of research until recently has focused only on the English language due to the lack of appropriate evaluation resources. Previous work on cross-lingual VQA has reported poor zero-shot transfer performance of current multilingual multimodal Transformers and large gaps to monolingual performance, attributed mostly to misalignment of text embeddings between the source and target languages, without providing any additional deeper analyses. In this work, we delve deeper and address different aspects of cross-lingual VQA holistically, aiming to understand the impact of input data, fine-tuning and evaluation regimes, and interactions between the two modalities in cross-lingual setups. 1) We tackle low transfer performance via novel methods that substantially reduce the gap to monolingual English performance, yielding +10 accuracy points over existing transfer methods. 2) We study and dissect cross-lingual VQA across ...
|
|
Keyword:
Computation and Language cs.CL; FOS Computer and information sciences
|
|
URL: https://arxiv.org/abs/2202.07630 https://dx.doi.org/10.48550/arxiv.2202.07630
|
|
BASE
|
|
Hide details
|
|
2 |
Cross-Lingual Dialogue Dataset Creation via Outline-Based Generation ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Improving Word Translation via Two-Stage Contrastive Learning ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Crossing the Conversational Chasm: A Primer on Natural Language Processing for Multilingual Task-Oriented Dialogue Systems ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Learning Domain-Specialised Representations for Cross-Lingual Biomedical Entity Linking ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Combining Deep Generative Models and Multi-lingual Pretraining for Semi-supervised Document Classification ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
MirrorWiC: On Eliciting Word-in-Context Representations from Pretrained Language Models ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Parameter space factorization for zero-shot learning across tasks and languages ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
MirrorWiC: On Eliciting Word-in-Context Representations from Pretrained Language Models ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
Learning Domain-Specialised Representations for Cross-Lingual Biomedical Entity Linking ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
MirrorWiC: On Eliciting Word-in-Context Representations from Pretrained Language Models ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Semantic Data Set Construction from Human Clustering and Spatial Arrangement ...
|
|
|
|
BASE
|
|
Show details
|
|
15 |
Fast, Effective, and Self-Supervised: Transforming Masked Language Models into Universal Lexical and Sentence Encoders ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Context vs Target Word: Quantifying Biases in Lexical Semantic Datasets ...
|
|
|
|
BASE
|
|
Show details
|
|
17 |
AM2iCo: Evaluating Word Meaning in Context across Low-Resource Languages with Adversarial Examples ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Fast, Effective, and Self-Supervised: Transforming Masked Language Models into Universal Lexical and Sentence Encoders ...
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Parameter space factorization for zero-shot learning across tasks and languages
|
|
|
|
In: Transactions of the Association for Computational Linguistics, 9 (2021)
|
|
BASE
|
|
Show details
|
|
20 |
AM2iCo: Evaluating Word Meaning in Context across Low-Resource Languages with Adversarial Examples ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|