1 |
Delving Deeper into Cross-lingual Visual Question Answering ...
|
|
|
|
Abstract:
Visual question answering (VQA) is one of the crucial vision-and-language tasks. Yet, the bulk of research until recently has focused only on the English language due to the lack of appropriate evaluation resources. Previous work on cross-lingual VQA has reported poor zero-shot transfer performance of current multilingual multimodal Transformers and large gaps to monolingual performance, attributed mostly to misalignment of text embeddings between the source and target languages, without providing any additional deeper analyses. In this work, we delve deeper and address different aspects of cross-lingual VQA holistically, aiming to understand the impact of input data, fine-tuning and evaluation regimes, and interactions between the two modalities in cross-lingual setups. 1) We tackle low transfer performance via novel methods that substantially reduce the gap to monolingual English performance, yielding +10 accuracy points over existing transfer methods. 2) We study and dissect cross-lingual VQA across ...
|
|
Keyword:
Computation and Language cs.CL; FOS Computer and information sciences
|
|
URL: https://arxiv.org/abs/2202.07630 https://dx.doi.org/10.48550/arxiv.2202.07630
|
|
BASE
|
|
Hide details
|
|
2 |
Cross-Lingual Dialogue Dataset Creation via Outline-Based Generation ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Improving Word Translation via Two-Stage Contrastive Learning ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Crossing the Conversational Chasm: A Primer on Natural Language Processing for Multilingual Task-Oriented Dialogue Systems ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Learning Domain-Specialised Representations for Cross-Lingual Biomedical Entity Linking ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Combining Deep Generative Models and Multi-lingual Pretraining for Semi-supervised Document Classification ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
MirrorWiC: On Eliciting Word-in-Context Representations from Pretrained Language Models ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Context vs Target Word: Quantifying Biases in Lexical Semantic Datasets ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
AM2iCo: Evaluating Word Meaning in Context across Low-Resource Languages with Adversarial Examples ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Fast, Effective, and Self-Supervised: Transforming Masked Language Models into Universal Lexical and Sentence Encoders ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Emergent Communication Pretraining for Few-Shot Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
A Closer Look at Few-Shot Crosslingual Transfer: The Choice of Shots Matters ...
|
|
|
|
BASE
|
|
Show details
|
|
15 |
Verb Knowledge Injection for Multilingual Event Processing ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Multi-SimLex: A Large-Scale Evaluation of Multilingual and Cross-Lingual Lexical Semantic Similarity ...
|
|
|
|
BASE
|
|
Show details
|
|
17 |
Probing Pretrained Language Models for Lexical Semantics ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
The Secret is in the Spectra: Predicting Cross-lingual Task Performance with Spectral Similarity Measures ...
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Specializing Unsupervised Pretraining Models for Word-Level Semantic Similarity ...
|
|
|
|
BASE
|
|
Show details
|
|
20 |
Do We Really Need Fully Unsupervised Cross-Lingual Embeddings? ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|