DE eng

Search in the Catalogues and Directories

Hits 1 – 3 of 3

1
Family of Origin and Family of Choice: Massively Parallel Lexiconized Iterative Pretraining for Severely Low Resource Machine Translation ...
Zhou, Zhong; Waibel, Alex. - : arXiv, 2021
Abstract: We translate a closed text that is known in advance into a severely low resource language by leveraging massive source parallelism. In other words, given a text in 124 source languages, we translate it into a severely low resource language using only ~1,000 lines of low resource data without any external help. Firstly, we propose a systematic method to rank and choose source languages that are close to the low resource language. We call the linguistic definition of language family Family of Origin (FAMO), and we call the empirical definition of higher-ranked languages using our metrics Family of Choice (FAMC). Secondly, we build an Iteratively Pretrained Multilingual Order-preserving Lexiconized Transformer (IPML) to train on ~1,000 lines (~3.5%) of low resource data. To translate named entities correctly, we build a massive lexicon table for 2,939 Bible named entities in 124 source languages, and include many that occur once and covers more than 66 severely low resource languages. Moreover, we also build a ...
Keyword: Artificial Intelligence cs.AI; Computation and Language cs.CL; FOS Computer and information sciences
URL: https://arxiv.org/abs/2104.05848
https://dx.doi.org/10.48550/arxiv.2104.05848
BASE
Hide details
2
Using Interlinear Glosses as Pivot in Low-Resource Multilingual Machine Translation ...
BASE
Show details
3
Paraphrases as Foreign Languages in Multilingual Neural Machine Translation ...
BASE
Show details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
3
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern