1 |
Can Character-based Language Models Improve Downstream Task Performance in Low-Resource and Noisy Language Scenarios?
|
|
|
|
In: Seventh Workshop on Noisy User-generated Text (W-NUT 2021, colocated with EMNLP 2021) ; https://hal.inria.fr/hal-03527328 ; Seventh Workshop on Noisy User-generated Text (W-NUT 2021, colocated with EMNLP 2021), Jan 2022, punta cana, Dominican Republic ; https://aclanthology.org/2021.wnut-1.47/ (2022)
|
|
BASE
|
|
Show details
|
|
2 |
Synthetic Data Augmentation for Zero-Shot Cross-Lingual Question Answering
|
|
|
|
In: https://hal.inria.fr/hal-03109187 ; 2021 (2021)
|
|
BASE
|
|
Show details
|
|
3 |
Can Character-based Language Models Improve Downstream Task Performance in Low-Resource and Noisy Language Scenarios? ...
|
|
|
|
Abstract:
Recent impressive improvements in NLP, largely based on the success of contextual neural language models, have been mostly demonstrated on at most a couple dozen high-resource languages. Building language models and, more generally, NLP systems for non-standardized and low-resource languages remains a challenging task. In this work, we focus on North-African colloquial dialectal Arabic written using an extension of the Latin script, called NArabizi, found mostly on social media and messaging communication. In this low-resource scenario with data displaying a high level of variability, we compare the downstream performance of a character-based language model on part-of-speech tagging and dependency parsing to that of monolingual and multilingual models. We show that a character-based model trained on only 99k sentences of NArabizi and fined-tuned on a small treebank of this language leads to performance close to those obtained with the same architecture pre-trained on large multilingual and monolingual ... : Camera ready version. Accepted to WNUT 2021 ...
|
|
Keyword:
Computation and Language cs.CL; FOS Computer and information sciences; Machine Learning cs.LG
|
|
URL: https://arxiv.org/abs/2110.13658 https://dx.doi.org/10.48550/arxiv.2110.13658
|
|
BASE
|
|
Hide details
|
|
4 |
Synthetic Data Augmentation for Zero-Shot Cross-Lingual Question Answering ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Synthetic Data Augmentation for Zero-Shot Cross-Lingual Question Answering ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|