22 |
Automatic identification methods on a corpus of twenty five fine-grained Arabic dialects
|
|
|
|
In: Arabic Language Processing: From Theory to Practice7th International Conference, ICALP 2019, Nancy, France, October 16–17, 2019, Proceedings ; https://hal.archives-ouvertes.fr/hal-02314245 ; Arabic Language Processing: From Theory to Practice 7th International Conference, ICALP 2019, Nancy, France, October 16–17, 2019, Proceedings, Communications in Computer and Information Science book series (CCIS, volume 1108), 2019, ⟨10.1007/978-3-030-32959-4_6⟩ (2019)
|
|
BASE
|
|
Show details
|
|
23 |
The SMarT Classifier for Arabic Fine-Grained Dialect Identification
|
|
|
|
In: MADAR Shared Task: Arabic Fine-Grained Dialect Identification Dialect identification campaign ; The Fourth Arabic Natural Language Processing Workshop co-located with ACL ; https://hal.archives-ouvertes.fr/hal-02166384 ; The Fourth Arabic Natural Language Processing Workshop co-located with ACL, Aug 2019, Florence, Italy (2019)
|
|
BASE
|
|
Show details
|
|
24 |
Script Independent Morphological Segmentation for Arabic Maghrebi Dialects: An Application to Machine Translation
|
|
|
|
In: ISSN: 1405-5546 ; EISSN: 2007-9737 ; Computación y sistemas ; https://hal.archives-ouvertes.fr/hal-02274533 ; Computación y sistemas, Instituto Politécnico Nacional IPN Centro de Investigación en Computación, In press, 23 (3), pp.979-989. ⟨10.13053/cys-23-3-3267⟩ (2019)
|
|
BASE
|
|
Show details
|
|
25 |
Markers in urban Hijazi discourse ; Markers in urban Hijazi discoures
|
|
|
|
BASE
|
|
Show details
|
|
26 |
Compliments and compliment responses in Saudi Arabic in text-based computer-mediated communication
|
|
|
|
BASE
|
|
Show details
|
|
27 |
Gender differences in Saudi Arabic question formation on Twitter
|
|
|
|
BASE
|
|
Show details
|
|
28 |
Integrating Dialects and Dialectology in the Curriculum of Teaching Arabic As a Foreign Language (TAFL)
|
|
|
|
BASE
|
|
Show details
|
|
29 |
The phonology and micro-typology of Arabic R
|
|
|
|
In: Glossa: a journal of general linguistics; Vol 4, No 1 (2019); 131 ; 2397-1835 (2019)
|
|
BASE
|
|
Show details
|
|
30 |
Durative aspect markers in modern Arabic dialects : cross-dialectal functions and historical development ...
|
|
|
|
BASE
|
|
Show details
|
|
35 |
Automatic Identification of Maghreb Dialects Using a Dictionary-Based Approach
|
|
|
|
In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) ; Eleventh International Conference on Language Resources and Evaluation (LREC 2018) ; https://hal.archives-ouvertes.fr/hal-02012150 ; Eleventh International Conference on Language Resources and Evaluation (LREC 2018), May 2018, Miyazaki, Japan (2018)
|
|
BASE
|
|
Show details
|
|
36 |
Statistical Machine Translation: Application to low resourced languages ; Traduction Automatique Fondée sur des Méthodes Statistiques : Application aux Langues peu Dotées en Ressources
|
|
|
|
In: https://hal.inria.fr/tel-03186940 ; Computation and Language [cs.CL]. École Supérieure d’Informatique, 2018. English (2018)
|
|
BASE
|
|
Show details
|
|
37 |
La communication entre Libanais et Jordaniens sur les réseaux numériques ; Communication Practices Between Lebanese and Jordanians on Digital Networks
|
|
|
|
In: Hermès [ISSN 0767-9513], Nouvelles voix de la recherche en communication, 2018, 82, p. 216 (2018)
|
|
BASE
|
|
Show details
|
|
38 |
A Multitask-Based Neural Machine Translation Model with Part-of-Speech Tags Integration for Arabic Dialects
|
|
|
|
In: Applied Sciences ; Volume 8 ; Issue 12 (2018)
|
|
BASE
|
|
Show details
|
|
40 |
Multi-dialect Arabic broadcast speech recognition
|
|
|
|
Abstract:
Dialectal Arabic speech research suffers from the lack of labelled resources and standardised orthography. There are three main challenges in dialectal Arabic speech recognition: (i) finding labelled dialectal Arabic speech data, (ii) training robust dialectal speech recognition models from limited labelled data and (iii) evaluating speech recognition for dialects with no orthographic rules. This thesis is concerned with the following three contributions: Arabic Dialect Identification: We are mainly dealing with Arabic speech without prior knowledge of the spoken dialect. Arabic dialects could be sufficiently diverse to the extent that one can argue that they are different languages rather than dialects of the same language. We have two contributions: First, we use crowdsourcing to annotate a multi-dialectal speech corpus collected from Al Jazeera TV channel. We obtained utterance level dialect labels for 57 hours of high-quality consisting of four major varieties of dialectal Arabic (DA), comprised of Egyptian, Levantine, Gulf or Arabic peninsula, North African or Moroccan from almost 1,000 hours. Second, we build an Arabic dialect identification (ADI) system. We explored two main groups of features, namely acoustic features and linguistic features. For the linguistic features, we look at a wide range of features, addressing words, characters and phonemes. With respect to acoustic features, we look at raw features such as mel-frequency cepstral coefficients combined with shifted delta cepstra (MFCC-SDC), bottleneck features and the i-vector as a latent variable. We studied both generative and discriminative classifiers, in addition to deep learning approaches, namely deep neural network (DNN) and convolutional neural network (CNN). In our work, we propose Arabic as a five class dialect challenge comprising of the previously mentioned four dialects as well as modern standard Arabic. Arabic Speech Recognition: We introduce our effort in building Arabic automatic speech recognition (ASR) and we create an open research community to advance it. This section has two main goals: First, creating a framework for Arabic ASR that is publicly available for research. We address our effort in building two multi-genre broadcast (MGB) challenges. MGB-2 focuses on broadcast news using more than 1,200 hours of speech and 130M words of text collected from the broadcast domain. MGB-3, however, focuses on dialectal multi-genre data with limited non-orthographic speech collected from YouTube, with special attention paid to transfer learning. Second, building a robust Arabic ASR system and reporting a competitive word error rate (WER) to use it as a potential benchmark to advance the state of the art in Arabic ASR. Our overall system is a combination of five acoustic models (AM): unidirectional long short term memory (LSTM), bidirectional LSTM (BLSTM), time delay neural network (TDNN), TDNN layers along with LSTM layers (TDNN-LSTM) and finally TDNN layers followed by BLSTM layers (TDNN-BLSTM). The AM is trained using purely sequence trained neural networks lattice-free maximum mutual information (LFMMI). The generated lattices are rescored using a four-gram language model (LM) and a recurrent neural network with maximum entropy (RNNME) LM. Our official WER is 13%, which has the lowest WER reported on this task. Evaluation: The third part of the thesis addresses our effort in evaluating dialectal speech with no orthographic rules. Our methods learn from multiple transcribers and align the speech hypothesis to overcome the non-orthographic aspects. Our multi-reference WER (MR-WER) approach is similar to the BLEU score used in machine translation (MT). We have also automated this process by learning different spelling variants from Twitter data. We mine automatically from a huge collection of tweets in an unsupervised fashion to build more than 11M n-to-m lexical pairs, and we propose a new evaluation metric: dialectal WER (WERd). Finally, we tried to estimate the word error rate (e-WER) with no reference transcription using decoding and language features. We show that our word error rate estimation is robust for many scenarios with and without the decoding features.
|
|
Keyword:
Arabic dialects; Arabic speech research; convolutional neural network; crowdsourcing; deep neural network; MFCC-SDC; standardised orthography
|
|
URL: http://hdl.handle.net/1842/31224
|
|
BASE
|
|
Hide details
|
|
|
|