1 |
Predicting lexical complexity in English texts: the Complex 2.0 dataset
|
|
|
|
BASE
|
|
Show details
|
|
2 |
Comparing Approaches to Dravidian Language Identification ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
An Exploratory Analysis of the Relation Between Offensive Language and Mental Health ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Multilingual Offensive Language Identification for Low-resource Languages ...
|
|
|
|
Abstract:
Offensive content is pervasive in social media and a reason for concern to companies and government organizations. Several studies have been recently published investigating methods to detect the various forms of such content (e.g. hate speech, cyberbullying, and cyberaggression). The clear majority of these studies deal with English partially because most annotated datasets available contain English data. In this paper, we take advantage of available English datasets by applying cross-lingual contextual word embeddings and transfer learning to make predictions in low-resource languages. We project predictions on comparable data in Arabic, Bengali, Danish, Greek, Hindi, Spanish, and Turkish. We report results of 0.8415 F1 macro for Bengali in TRAC-2 shared task, 0.8532 F1 macro for Danish and 0.8701 F1 macro for Greek in OffensEval 2020, 0.8568 F1 macro for Hindi in HASOC 2019 shared task and 0.7513 F1 macro for Spanish in in SemEval-2019 Task 5 (HatEval) showing that our approach compares favourably to the ... : Accepted to ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP). This is an extended version of a paper accepted to EMNLP. arXiv admin note: substantial text overlap with arXiv:2010.05324 ...
|
|
Keyword:
Artificial Intelligence cs.AI; Computation and Language cs.CL; FOS Computer and information sciences; Machine Learning cs.LG; Social and Information Networks cs.SI
|
|
URL: https://dx.doi.org/10.48550/arxiv.2105.05996 https://arxiv.org/abs/2105.05996
|
|
BASE
|
|
Hide details
|
|
6 |
WLV-RIT at SemEval-2021 Task 5: A Neural Transformer Framework for Detecting Toxic Spans ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
An Exploratory Analysis of the Relation between Offensive Language and Mental Health ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Handling Extreme Class Imbalance in Technical Logbook Datasets ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
SOLID: A Large-Scale Semi-Supervised Dataset for Offensive Language Identification ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
LCP-RIT at SemEval-2021 Task 1: Exploring Linguistic Features for Lexical Complexity Prediction ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
A Computational Exploration of Pejorative Language in Social Media ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Multilingual offensive language identification for low-resource languages
|
|
|
|
In: 21 ; 1 (2021)
|
|
BASE
|
|
Show details
|
|
20 |
Overview of the HASOC subtrack at FIRE 2021: Hate speech and offensive content identification in English and Indo-Aryan languages and conversational hate speech
|
|
|
|
In: 1 ; 3 (2021)
|
|
BASE
|
|
Show details
|
|
|
|