DE eng

Search in the Catalogues and Directories

Hits 1 – 11 of 11

1
MMTAfrica: Multilingual Machine Translation for African Languages ...
BASE
Show details
2
Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets
In: https://hal.inria.fr/hal-03177623 ; 2021 (2021)
BASE
Show details
3
MasakhaNER: Named entity recognition for African languages
In: EISSN: 2307-387X ; Transactions of the Association for Computational Linguistics ; https://hal.inria.fr/hal-03350962 ; Transactions of the Association for Computational Linguistics, The MIT Press, 2021, ⟨10.1162/tacl⟩ (2021)
BASE
Show details
4
OkwuGbé: End-to-End Speech Recognition for Fon and Igbo ...
BASE
Show details
5
Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets ...
Abstract: With the success of large-scale pre-training and multilingual modeling in Natural Language Processing (NLP), recent years have seen a proliferation of large, web-mined text datasets covering hundreds of languages. We manually audit the quality of 205 language-specific corpora released with five major public datasets (CCAligned, ParaCrawl, WikiMatrix, OSCAR, mC4). Lower-resource corpora have systematic issues: At least 15 corpora have no usable text, and a significant fraction contains less than 50% sentences of acceptable quality. In addition, many are mislabeled or use nonstandard/ambiguous language codes. We demonstrate that these issues are easy to detect even for non-proficient speakers, and supplement the human audit with automatic analyses. Finally, we recommend techniques to evaluate and improve multilingual corpora and discuss potential risks that come with low-quality data releases. ... : Accepted at TACL; pre-MIT Press publication version ...
Keyword: Artificial Intelligence cs.AI; Computation and Language cs.CL; FOS Computer and information sciences
URL: https://dx.doi.org/10.48550/arxiv.2103.12028
https://arxiv.org/abs/2103.12028
BASE
Hide details
6
Fon French Daily Dialogues Parallel Data ...
BASE
Show details
7
Fon French Daily Dialogues Parallel Data ...
BASE
Show details
8
OkwuGbé: End-to-End Speech Recognition for Fon and Igbo ...
BASE
Show details
9
FFR v1.1: Fon-French Neural Machine Translation ...
BASE
Show details
10
FFR V1.0: Fon-French Neural Machine Translation ...
BASE
Show details
11
Participatory Research for Low-resourced Machine Translation:A Case Study in African Languages
BASE
Show details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
11
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern