41 |
Dataset construction for the detection of anti-social behaviour in online communication in arabic
|
|
|
|
Abstract:
peer-reviewed ; Warning: this paper contains a range of words which may cause offence. In recent years, many studies target anti-social behaviour such as offensive language and cyberbullying in online communication. Typically, these studies collect data from various reachable sources, the majority of the datasets being in English. However, to the best of our knowledge, there is no dataset collected from the YouTube platform targeting Arabic text and overall there are only a few datasets of Arabic text, collected from other social platforms for the purpose of offensive language detection. Therefore, in this paper we contribute to this field by presenting a dataset of YouTube comments in Arabic, specifically designed to be used for the detection of offensive language in a machine learning scenario. Our dataset contains a range of offensive language and flaming in the form of YouTube comments. We document the labelling process we have conducted, taking into account the difference in the Arab dialects and the diversity of perception of offensive language throughout the Arab world. Furthermore, statistical analysis of the dataset is presented, in order to make it ready for use as a training dataset for predictive modelling.
|
|
Keyword:
Anti-social behaviour online; Arabic dataset; Arabic dialects; harassment detection; offensive language; text classification; text mining
|
|
URL: http://hdl.handle.net/10344/7878 https://doi.org/10.1016/j.procs.2018.10.473
|
|
BASE
|
|
Hide details
|
|
42 |
Algunos proverbios de actual uso en Damasco ; Some proverbs in current use in Damascus
|
|
|
|
In: Dialectologia: revista electrònica; 2018: Núm. 20; p. 43-60 (2018)
|
|
BASE
|
|
Show details
|
|
43 |
Maghrebi Arabic dialect processing: an overview
|
|
|
|
In: ICNLSSP 2017 - International Conference on Natural Language, Signal and Speech Processing ; https://hal.inria.fr/hal-01660001 ; ICNLSSP 2017 - International Conference on Natural Language, Signal and Speech Processing, ISGA, Dec 2017, Casablanca, Morocco (2017)
|
|
BASE
|
|
Show details
|
|
45 |
Une approche linguistique pour la détection des dialectes arabes
|
|
|
|
In: Actes de TALN 2017 ; 2017-06-26 ; https://hal.archives-ouvertes.fr/hal-02012244 ; 2017-06-26, 2017, Orléans, France (2017)
|
|
BASE
|
|
Show details
|
|
46 |
Creating Parallel Arabic Dialect Corpus: Pitfalls to Avoid
|
|
|
|
In: 18th International Conference on Computational Linguistics and Intelligent Text Processing (CICLING) ; https://hal.archives-ouvertes.fr/hal-01557405 ; 18th International Conference on Computational Linguistics and Intelligent Text Processing (CICLING), Apr 2017, Budapest, Hungary (2017)
|
|
BASE
|
|
Show details
|
|
47 |
Proceedings of the International Conference on Natural Language Processing, Signal and Speech Processing
|
|
|
|
In: https://hal.archives-ouvertes.fr/hal-03349724 ; 2017, 978-9954-99-758-1 (2017)
|
|
BASE
|
|
Show details
|
|
50 |
Refugee Migration, Dialect Contact, And Morphophonemic Change In Palestinian Arabic ...
|
|
|
|
BASE
|
|
Show details
|
|
51 |
Durative aspect markers in modern Arabic dialects : cross-dialectal functions and historical development
|
|
|
|
BASE
|
|
Show details
|
|
52 |
The phonetic and phonological status of the r-phones in Tunisian Arabic
|
|
|
|
In: Lingua Posnaniensis, Vol 59, Iss 2, Pp 69-85 (2017) (2017)
|
|
BASE
|
|
Show details
|
|
53 |
The effect of social factors on emphatic-plain contrast in Jordan: a sociphonetic study of arabic in Amman City ; Doctor of Phiosophy
|
|
|
|
BASE
|
|
Show details
|
|
54 |
Morphologically Annotated Corpora and Morphological Analyzers for Moroccan and Sanaani Yemeni Arabic
|
|
|
|
In: 10th Language Resources and Evaluation Conference (LREC 2016) ; https://hal.archives-ouvertes.fr/hal-01349201 ; 10th Language Resources and Evaluation Conference (LREC 2016), May 2016, Portoroz, Slovenia (2016)
|
|
BASE
|
|
Show details
|
|
55 |
A Large Scale Corpus of Gulf Arabic
|
|
|
|
In: Language Resources and Evaluation Conference ; https://hal.archives-ouvertes.fr/hal-01349204 ; Language Resources and Evaluation Conference, 2016, Portoroz, Slovenia (2016)
|
|
BASE
|
|
Show details
|
|
56 |
Invitation in Saudi Arabic : a socio-pragmatic analysis ; Title on signature form: Invitation in Saudi culture : socio-pragmatic analysis.
|
|
|
|
BASE
|
|
Show details
|
|
57 |
On interaction between external and internal markers in expressing aspect in Arabic dialect varieties
|
|
|
|
In: Aspectuality and TemporalityDescriptive and theoretical issues ; https://halshs.archives-ouvertes.fr/halshs-01477484 ; Aspectuality and Temporality Descriptive and theoretical issues, pp.325-355, 2016, 0165-7763 (2016)
|
|
BASE
|
|
Show details
|
|
58 |
Language Contact in the Sahara
|
|
|
|
In: https://halshs.archives-ouvertes.fr/halshs-01376150 ; 2016, ⟨10.1093/acrefore/9780199384655.013.141⟩ (2016)
|
|
BASE
|
|
Show details
|
|
59 |
On interaction between external and internal markers in expressing aspect in Arabic dialect varieties
|
|
|
|
In: Aspectuality and TemporalityDescriptive and theoretical issues ; https://halshs.archives-ouvertes.fr/halshs-01477484 ; Aspectuality and Temporality Descriptive and theoretical issues, pp.325-355, 2016, 0165-7763 (2016)
|
|
BASE
|
|
Show details
|
|
60 |
A Sociophonetic Account Of Morphophonemic Variation In Palestinian Arabic ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|