1 |
Automatic Detection of Entity-Manipulated Text using Factual Knowledge ...
|
|
|
|
BASE
|
|
Show details
|
|
2 |
Towards Afrocentric NLP for African Languages: Where We Are and Where We Can Go ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Self-Training Pre-Trained Language Models for Zero- and Few-Shot Multi-Dialectal Arabic Sequence Labeling ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Investigating Code-Mixed Modern Standard Arabic-Egyptian to English Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
NADI 2021: The Second Nuanced Arabic Dialect Identification Shared Task ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Translating the Unseen? Yoruba-English MT in Low-Resource, Morphologically-Unmarked Settings ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Exploring Text-to-Text Transformers for English to Hinglish Machine Translation with Synthetic Code-Mixing ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
ARBERT & MARBERT: Deep Bidirectional Transformers for Arabic ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
AraT5: Text-to-Text Transformers for Arabic Language Generation ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
ARBERT & MARBERT: Deep Bidirectional Transformers for Arabic ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
NADI 2020: The First Nuanced Arabic Dialect Identification Shared Task ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
DiaLex: A Benchmark for Evaluating Multidialectal Arabic Word Embeddings ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Mega-COV: A Billion-Scale Dataset of 100+ Languages for COVID-19 ...
|
|
|
|
Abstract:
We describe Mega-COV, a billion-scale dataset from Twitter for studying COVID-19. The dataset is diverse (covers 268 countries), longitudinal (goes as back as 2007), multilingual (comes in 100+ languages), and has a significant number of location-tagged tweets (~169M tweets). We release tweet IDs from the dataset. We also develop and release two powerful models, one for identifying whether or not a tweet is related to the pandemic (best F1=97%) and another for detecting misinformation about COVID-19 (best F1=92%). A human annotation study reveals the utility of our models on a subset of Mega-COV. Our data and models can be useful for studying a wide host of phenomena related to the pandemic. Mega-COV and our models are publicly available. ...
|
|
Keyword:
Computation and Language cs.CL; FOS Computer and information sciences; Social and Information Networks cs.SI
|
|
URL: https://arxiv.org/abs/2005.06012 https://dx.doi.org/10.48550/arxiv.2005.06012
|
|
BASE
|
|
Hide details
|
|
14 |
One Model to Pronounce Them All: Multilingual Grapheme-to-Phoneme Conversion With a Transformer Ensemble ...
|
|
|
|
BASE
|
|
Show details
|
|
15 |
Toward Micro-Dialect Identification in Diaglossic and Code-Switched Environments ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Automatic Detection of Machine Generated Text: A Critical Survey ...
|
|
|
|
BASE
|
|
Show details
|
|
17 |
NADI 2020: The First Nuanced Arabic Dialect Identification Shared Task ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Proceedings of the Fifth Arabic Natural Language Processing Workshop
|
|
|
|
BASE
|
|
Show details
|
|
19 |
AraWEAT: Multidimensional analysis of biases in Arabic word embeddings
|
|
|
|
BASE
|
|
Show details
|
|
20 |
AraNet: A Deep Learning Toolkit for Arabic Social Media ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|