1 |
The effect of domain and diacritics in Yorùbá-English neural machine translation
|
|
|
|
In: 18th Biennial Machine Translation Summit ; https://hal.inria.fr/hal-03350967 ; 18th Biennial Machine Translation Summit, Aug 2021, Orlando, United States (2021)
|
|
BASE
|
|
Show details
|
|
5 |
A Data Augmentation Approach for Sign-Language-To-Text Translation In-The-Wild ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
The Effect of Domain and Diacritics in Yorùbá-English Neural Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Integrating Unsupervised Data Generation into Self-Supervised Neural Machine Translation for Low-Resource Languages ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Comparing Feature-Engineering and Feature-Learning Approaches for Multilingual Translationese Classification ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Comparing Feature-Engineering and Feature-Learning Approaches for Multilingual Translationese Classification ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Automatic classification of human translation and machine translation : a study from the perspective of lexical diversity
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Tailoring and Evaluating the Wikipedia for in-Domain Comparable Corpora Extraction ...
|
|
|
|
BASE
|
|
Show details
|
|
17 |
Multilingual and Interlingual Semantic Representations for Natural Language Processing: A Brief Introduction
|
|
|
|
In: Computational Linguistics, Vol 46, Iss 2, Pp 249-255 (2020) (2020)
|
|
BASE
|
|
Show details
|
|
18 |
GeBioToolkit: Automatic Extraction of Gender-Balanced Multilingual Corpus of Wikipedia Biographies ...
|
|
|
|
Abstract:
We introduce GeBioToolkit, a tool for extracting multilingual parallel corpora at sentence level, with document and gender information from Wikipedia biographies. Despite thegender inequalitiespresent in Wikipedia, the toolkit has been designed to extract corpus balanced in gender. While our toolkit is customizable to any number of languages (and different domains), in this work we present a corpus of 2,000 sentences in English, Spanish and Catalan, which has been post-edited by native speakers to become a high-quality dataset for machinetranslation evaluation. While GeBioCorpus aims at being one of the first non-synthetic gender-balanced test datasets, GeBioToolkit aims at paving the path to standardize procedures to produce gender-balanced datasets ...
|
|
Keyword:
Computation and Language cs.CL; FOS Computer and information sciences
|
|
URL: https://arxiv.org/abs/1912.04778 https://dx.doi.org/10.48550/arxiv.1912.04778
|
|
BASE
|
|
Hide details
|
|
19 |
Massive vs. Curated Word Embeddings for Low-Resourced Languages. The Case of Yorùbá and Twi ...
|
|
|
|
BASE
|
|
Show details
|
|
20 |
Query Translation for Cross-lingual Search in the Academic Search Engine PubPsych ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|