1 |
Morphological Processing of Low-Resource Languages: Where We Are and What's Next ...
|
|
|
|
BASE
|
|
Show details
|
|
2 |
Match the Script, Adapt if Multilingual: Analyzing the Effect of Multilingual Pretraining on Cross-lingual Transferability ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Don't Rule Out Monolingual Speakers: A Method For Crowdsourcing Machine Translation Data ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Findings of the LoResMT 2021 Shared Task on COVID and Sign Language for Low-resource Languages ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
How to Adapt Your Pretrained Multilingual Model to 1600 Languages ...
|
|
|
|
Abstract:
Pretrained multilingual models (PMMs) enable zero-shot learning via cross-lingual transfer, performing best for languages seen during pretraining. While methods exist to improve performance for unseen languages, they have almost exclusively been evaluated using amounts of raw text only available for a small fraction of the world's languages. In this paper, we evaluate the performance of existing methods to adapt PMMs to new languages using a resource available for over 1600 languages: the New Testament. This is challenging for two reasons: (1) the small corpus size, and (2) the narrow domain. While performance drops for all approaches, we surprisingly still see gains of up to $17.69\%$ accuracy for part-of-speech tagging and $6.29$ F1 for NER on average over all languages as compared to XLM-R. Another unexpected finding is that continued pretraining, the simplest approach, performs best. Finally, we perform a case study to disentangle the effects of domain and size and to shed light on the influence of the ... : Accepted to ACL 2021 ...
|
|
Keyword:
Computation and Language cs.CL; FOS Computer and information sciences
|
|
URL: https://arxiv.org/abs/2106.02124 https://dx.doi.org/10.48550/arxiv.2106.02124
|
|
BASE
|
|
Hide details
|
|
6 |
Findings of the AmericasNLP 2021 Shared Task on Open Machine Translation for Indigenous Languages of the Americas ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
{PROST}: {P}hysical Reasoning about Objects through Space and Time ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Don't Rule Out Monolingual Speakers: A Method For Crowdsourcing Machine Translation Data ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
How to Adapt Your Pretrained Multilingual Model to 1600 Languages ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
AmericasNLI: Evaluating Zero-shot Natural Language Understanding of Pretrained Multilingual Models in Truly Low-resource Languages ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
CLiMP: A Benchmark for Chinese Language Model Evaluation ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas
|
|
In: Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas. Edited by: Mager, Manuel; Oncevay, Arturo; Rios, Annette; Meza Ruiz, Ivan Vladimir; Palmer, Alexis; Neubig, Graham; Kann, Katharina (2021). Online: Association for Computational Linguistics. (2021)
|
|
BASE
|
|
Show details
|
|
15 |
Learning to Learn Morphological Inflection for Resource-Poor Languages ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Acquisition of Inflectional Morphology in Artificial Neural Networks With Prior Knowledge
|
|
|
|
In: Proceedings of the Society for Computation in Linguistics (2020)
|
|
BASE
|
|
Show details
|
|
17 |
Probing for Semantic Classes: Diagnosing the Meaning Content of Word Embeddings
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Probing for Semantic Classes: Diagnosing the Meaning Content of Word Embeddings ...
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Acquisition of Inflectional Morphology in Artificial Neural Networks With Prior Knowledge ...
|
|
|
|
BASE
|
|
Show details
|
|
20 |
Grammatical Gender, Neo-Whorfianism, and Word Embeddings: A Data-Driven Approach to Linguistic Relativity ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|