2 |
A Transformer-Based Neural Machine Translation Model for Arabic Dialects That Utilizes Subword Units
|
|
|
|
In: Sensors ; Volume 21 ; Issue 19 (2021)
|
|
Abstract:
Languages that allow free word order, such as Arabic dialects, are of significant difficulty for neural machine translation (NMT) because of many scarce words and the inefficiency of NMT systems to translate these words. Unknown Word (UNK) tokens represent the out-of-vocabulary words for the reason that NMT systems run with vocabulary that has fixed size. Scarce words are encoded completely as sequences of subword pieces employing the Word-Piece Model. This research paper introduces the first Transformer-based neural machine translation model for Arabic vernaculars that employs subword units. The proposed solution is based on the Transformer model that has been presented lately. The use of subword units and shared vocabulary within the Arabic dialect (the source language) and modern standard Arabic (the target language) enhances the behavior of the multi-head attention sublayers for the encoder by obtaining the overall dependencies between words of input sentence for Arabic vernacular. Experiments are carried out from Levantine Arabic vernacular (LEV) to modern standard Arabic (MSA) and Maghrebi Arabic vernacular (MAG) to MSA, Gulf–MSA, Nile–MSA, Iraqi Arabic (IRQ) to MSA translation tasks. Extensive experiments confirm that the suggested model adequately addresses the unknown word issue and boosts the quality of translation from Arabic vernaculars to Modern standard Arabic (MSA).
|
|
Keyword:
Arabic dialects; modern standard Arabic; multi-head attention; neural machine translation (NMT); self-attention; shared vocabulary; subword units; transformer
|
|
URL: https://doi.org/10.3390/s21196509
|
|
BASE
|
|
Hide details
|
|
4 |
The Development of Case Morphology and Sentential Word Order in Arabic as a Second Language: A Processability Perspective
|
|
|
|
In: Theses and Dissertations (2021)
|
|
BASE
|
|
Show details
|
|
5 |
Automatic speech recognition and machine translation of Arabic and dialectal videos ; Reconnaissance et traduction automatique de la parole de vidéos arabes et dialectales
|
|
|
|
In: https://hal.univ-lorraine.fr/tel-03132934 ; Informatique et langage [cs.CL]. Université de Lorraine, 2020. Français. ⟨NNT : 2020LORR0157⟩ (2020)
|
|
BASE
|
|
Show details
|
|
6 |
Linguistic Analysis and Automatic Information Extraction of Semantic Relations in Arabic ; Analyse linguistique et extraction automatique de relations sémantiques des textes en arabe
|
|
|
|
In: https://hal.archives-ouvertes.fr/tel-03572307 ; Linguistique. Université Bourgogne Franche-Comté, 2020. Français (2020)
|
|
BASE
|
|
Show details
|
|
7 |
An Arabic Dataset for Disease Named Entity Recognition with Multi-Annotation Schemes ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
An Arabic Dataset for Disease Named Entity Recognition with Multi-Annotation Schemes ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Remarks on Modern Standard Arabic Construct State and Quantification
|
|
|
|
In: Theses and Dissertations (2020)
|
|
BASE
|
|
Show details
|
|
10 |
THE PLACE DEIXIS OF MODERN STANDARD ARABIC: A CLOSER LOOK AT THE DIMENSIONAL SYSTEM AND THE FACTORS THAT CONTROL THE CHOICE OF PLACE DEICTIC EXPRESSIONS
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Integrating Dialects and Dialectology in the Curriculum of Teaching Arabic As a Foreign Language (TAFL)
|
|
|
|
BASE
|
|
Show details
|
|
12 |
Grammatical Gender Processing in Standard Arabic as a First and a Second Language ...
|
|
Alamry, Ali. - : Université d'Ottawa / University of Ottawa, 2019
|
|
BASE
|
|
Show details
|
|
13 |
Concord and agreement features in Modern Standard Arabic
|
|
|
|
In: Glossa: a journal of general linguistics; Vol 4, No 1 (2019); 91 ; 2397-1835 (2019)
|
|
BASE
|
|
Show details
|
|
14 |
Grammatical Gender Processing in Standard Arabic as a First and a Second Language
|
|
Alamry, Ali. - : Université d'Ottawa / University of Ottawa, 2019
|
|
BASE
|
|
Show details
|
|
15 |
Phonetic-Form constraints in Arabic coordination
|
|
|
|
In: Lingua Posnaniensis, Vol 61, Iss 1, Pp 23-42 (2019) (2019)
|
|
BASE
|
|
Show details
|
|
16 |
The influence of emphatic /dˁ/ on Modern Standard Arabic vowels: An acoustic analysis
|
|
|
|
In: Lingua Posnaniensis, Vol 61, Iss 1, Pp 43-61 (2019) (2019)
|
|
BASE
|
|
Show details
|
|
17 |
Creating Parallel Arabic Dialect Corpus: Pitfalls to Avoid
|
|
|
|
In: 18th International Conference on Computational Linguistics and Intelligent Text Processing (CICLING) ; https://hal.archives-ouvertes.fr/hal-01557405 ; 18th International Conference on Computational Linguistics and Intelligent Text Processing (CICLING), Apr 2017, Budapest, Hungary (2017)
|
|
BASE
|
|
Show details
|
|
18 |
Seeking control in Modern Standard Arabic
|
|
|
|
In: Glossa: a journal of general linguistics; Vol 2, No 1 (2017); 90 ; 2397-1835 (2017)
|
|
BASE
|
|
Show details
|
|
19 |
Adaptation of a Term Extractor to Arabic Specialised Texts: First Experiments and Limits
|
|
|
|
In: International Conference on Intelligent Text Processing and Computational Linguistics ; https://hal.archives-ouvertes.fr/hal-01771875 ; International Conference on Intelligent Text Processing and Computational Linguistics, Springer, Jan 2016, Konya, Turkey (2016)
|
|
BASE
|
|
Show details
|
|
20 |
An Algerian dialect: Study and Resources
|
|
|
|
In: ISSN: 2158-107X ; EISSN: 2156-5570 ; International journal of advanced computer science and applications (IJACSA) ; https://hal.archives-ouvertes.fr/hal-01297415 ; International journal of advanced computer science and applications (IJACSA), The Science and Information Organization, 2016, 7 (3), pp.384-396. ⟨10.14569/IJACSA.2016.070353⟩ (2016)
|
|
BASE
|
|
Show details
|
|
|
|