1 |
Bilingual Dictionary Based Neural Machine Translation without Using Parallel Sentences ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Distributional Discrepancy: A Metric for Unconditional Text Generation ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
FGraDA: A Dataset and Benchmark for Fine-Grained Domain Adaptation in Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Exemplar-Controllable Paraphrasing and Translation using Bitext ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
SpellGCN: Incorporating Phonological and Visual Similarities into Language Models for Chinese Spelling Check ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
Cross-lingual Word Embeddings beyond Zero-shot Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Leveraging Monolingual Data with Self-Supervision for Multilingual Neural Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Synchronous Bidirectional Learning for Multilingual Lip Reading ...
|
|
|
|
BASE
|
|
Show details
|
|
15 |
Multilingual Translation with Extensible Multilingual Pretraining and Finetuning ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
#Election2020: The First Public Twitter Dataset on the 2020 US Presidential Election ...
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Knowledge Distillation for Multilingual Unsupervised Neural Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
20 |
Model Selection for Cross-Lingual Transfer ...
|
|
|
|
Abstract:
Transformers that are pre-trained on multilingual corpora, such as, mBERT and XLM-RoBERTa, have achieved impressive cross-lingual transfer capabilities. In the zero-shot transfer setting, only English training data is used, and the fine-tuned model is evaluated on another target language. While this works surprisingly well, substantial variance has been observed in target language performance between different fine-tuning runs, and in the zero-shot setup, no target-language development data is available to select among multiple fine-tuned models. Prior work has relied on English dev data to select among models that are fine-tuned with different learning rates, number of steps and other hyperparameters, often resulting in suboptimal choices. In this paper, we show that it is possible to select consistently better models when small amounts of annotated data are available in auxiliary pivot languages. We propose a machine learning approach to model selection that uses the fine-tuned model's own internal ... : EMNLP 2021 ...
|
|
Keyword:
Computation and Language cs.CL; FOS Computer and information sciences; Machine Learning cs.LG
|
|
URL: https://arxiv.org/abs/2010.06127 https://dx.doi.org/10.48550/arxiv.2010.06127
|
|
BASE
|
|
Hide details
|
|
|
|