1 |
The FLORES-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
2 |
LAWDR: Language-Agnostic Weighted Document Representations from Pre-trained Models ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Classification-based Quality Estimation: Small and Efficient Models for Real-world Applications ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Alternative Input Signals Ease Transfer in Multilingual Machine Translation ...
|
|
|
|
Abstract:
Recent work in multilingual machine translation (MMT) has focused on the potential of positive transfer between languages, particularly cases where higher-resourced languages can benefit lower-resourced ones. While training an MMT model, the supervision signals learned from one language pair can be transferred to the other via the tokens shared by multiple source languages. However, the transfer is inhibited when the token overlap among source languages is small, which manifests naturally when languages use different writing systems. In this paper, we tackle inhibited transfer by augmenting the training data with alternative signals that unify different writing systems, such as phonetic, romanized, and transliterated input. We test these signals on Indic and Turkic languages, two language families where the writing systems differ but languages still share common features. Our results indicate that a straightforward multi-source self-ensemble -- training a model on a mixture of various signals and ensembling ...
|
|
Keyword:
Computation and Language cs.CL; FOS Computer and information sciences
|
|
URL: https://dx.doi.org/10.48550/arxiv.2110.07804 https://arxiv.org/abs/2110.07804
|
|
BASE
|
|
Hide details
|
|
6 |
Adapting High-resource NMT Models to Translate Low-resource Related Languages without Parallel Data ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
AmericasNLI: Evaluating Zero-shot Natural Language Understanding of Pretrained Multilingual Models in Truly Low-resource Languages ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Multilingual Translation with Extensible Multilingual Pretraining and Finetuning ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
MLQE-PE: A Multilingual Quality Estimation and Post-Editing Dataset ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Beyond English-Centric Multilingual Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Unsupervised Cross-lingual Representation Learning at Scale ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|