DE eng

Search in the Catalogues and Directories

Hits 1 – 5 of 5

1
Offline Corpus Augmentation for English-Amharic Machine Translation
In: 2022 The 5th International Conference on Information and Computer Technologies ; https://hal.archives-ouvertes.fr/hal-03547539 ; 2022 The 5th International Conference on Information and Computer Technologies, Mar 2022, New York, United States (2022)
Abstract: International audience ; The purpose of this study was to investigate the effect of corpus augmentation on the quality of English-Amharic Machine Translation (MT). In fact, trigram and four-gram Statistical Machine Translation (SMT) language models, as well as Neural Machine Translation (NMT) models based on Gated Recurrent Units (GRU) were used. They were trained independently using both the original and augmented corpus to see how the augmentation of the corpus affects the translation quality of these models. These two corpora (original and augmented) contain 225,304 and 463,796 English-Amharic parallel sentences respectively. To complete the corpus augmentation challenge, an offline token level tokenization technique was used. This technique (corpus augmentation) was used before any other MT processes were started. Among several token-level tokenization mechanisms, random insertion, replacement, deletion, and swapping approaches were chosen and implemented. After both models had been trained, the Bilingual Evaluation Understudy (BLEU) ratings were collected and analyzed. Our results demonstrate that the models trained with the augmented corpus outperform their corresponding models (models trained with the original corpus) in terms of BLEU scores. As a result, we can conclude that corpus augmentation did indeed help in the improvement of the performance of both SMT and NMT translation systems.
Keyword: [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; Amharic language; Corpus Augmentation; GRU; Machine Translation; NMT; SMT; Token level augmentation
URL: https://hal.archives-ouvertes.fr/hal-03547539
https://hal.archives-ouvertes.fr/hal-03547539/file/ICICT2022Augmented_corpusFinal%20Draft.pdf
https://hal.archives-ouvertes.fr/hal-03547539/document
BASE
Hide details
2
Corpus Augmentation for Neural Machine Translation with Chinese-Japanese Parallel Corpora
In: Applied Sciences ; Volume 9 ; Issue 10 (2019)
BASE
Show details
3
Augmented Role Filling Capabilities for Semantic Interpretation of Spoken Language
In: DTIC (1991)
BASE
Show details
4
Massively Parallel Network Architectures for Automatic Recognition of Visual Speech Signals
In: DTIC AND NTIS (1990)
BASE
Show details
5
Research on Narrowband Communications
In: DTIC AND NTIS (1981)
BASE
Show details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
5
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern