1 |
Multi-SimLex: A Large-Scale Evaluation of Multilingual and Cross-Lingual Lexical Semantic Similarity
|
|
|
|
In: ISSN: 0891-2017 ; EISSN: 1530-9312 ; Computational Linguistics ; https://hal.archives-ouvertes.fr/hal-02975786 ; Computational Linguistics, Massachusetts Institute of Technology Press (MIT Press), 2020, 46 (4), pp.847-897 ; https://direct.mit.edu/coli/article/46/4/847/97326/Multi-SimLex-A-Large-Scale-Evaluation-of (2020)
|
|
BASE
|
|
Show details
|
|
2 |
Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Multidirectional Associative Optimization of Function-Specific Word Representations ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Emergent Communication Pretraining for Few-Shot Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Orthogonal Language and Task Adapters in Zero-Shot Cross-Lingual Transfer ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models ...
|
|
|
|
Abstract:
In this work, we provide a systematic and comprehensive empirical comparison of pretrained multilingual language models versus their monolingual counterparts with regard to their monolingual task performance. We study a set of nine typologically diverse languages with readily available pretrained monolingual models on a set of five diverse monolingual downstream tasks. We first aim to establish, via fair and controlled comparisons, if a gap between the multilingual and the corresponding monolingual representation of that language exists, and subsequently investigate the reason for any performance difference. To disentangle conflating factors, we train new monolingual models on the same data, with monolingually and multilingually trained tokenizers. We find that while the pretraining data size is an important factor, a designated monolingual tokenizer plays an equally important role in the downstream performance. Our results show that languages that are adequately represented in the multilingual model's ... : ACL 2021 ...
|
|
Keyword:
Computation and Language cs.CL; FOS Computer and information sciences
|
|
URL: https://arxiv.org/abs/2012.15613 https://dx.doi.org/10.48550/arxiv.2012.15613
|
|
BASE
|
|
Hide details
|
|
10 |
UNKs Everywhere: Adapting Multilingual Language Models to New Scripts ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Manual Clustering and Spatial Arrangement of Verbs for Multilingual Evaluation and Typology Analysis ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
XHate-999: Analyzing and Detecting Abusive Language Across Domains and Languages ...
|
|
|
|
BASE
|
|
Show details
|
|
15 |
Emergent Communication Pretraining for Few-Shot Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
From Zero to Hero: On the Limitations of Zero-Shot Cross-Lingual Transfer with Multilingual Transformers ...
|
|
|
|
BASE
|
|
Show details
|
|
17 |
XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Emergent Communication Pretraining for Few-Shot Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Manual Clustering and Spatial Arrangement of Verbs for Multilingual Evaluation and Typology Analysis ...
|
|
|
|
BASE
|
|
Show details
|
|
20 |
A Closer Look at Few-Shot Crosslingual Transfer: The Choice of Shots Matters ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|