Page: 1 2 3 4 5 6 7 8 9... 690
81 |
Adapting BigScience Multilingual Model to Unseen Languages ...
|
|
|
|
BASE
|
|
Show details
|
|
82 |
On Efficiently Acquiring Annotations for Multilingual Models ...
|
|
|
|
BASE
|
|
Show details
|
|
83 |
Team ÚFAL at CMCL 2022 Shared Task: Figuring out the correct recipe for predicting Eye-Tracking features using Pretrained Language Models ...
|
|
|
|
BASE
|
|
Show details
|
|
84 |
Does Corpus Quality Really Matter for Low-Resource Languages? ...
|
|
|
|
BASE
|
|
Show details
|
|
85 |
IIITDWD-ShankarB@ Dravidian-CodeMixi-HASOC2021: mBERT based model for identification of offensive content in south Indian languages ...
|
|
|
|
BASE
|
|
Show details
|
|
86 |
mSLAM: Massively multilingual joint pre-training for speech and text ...
|
|
|
|
BASE
|
|
Show details
|
|
87 |
On the Representation Collapse of Sparse Mixture of Experts ...
|
|
|
|
BASE
|
|
Show details
|
|
88 |
Politics and Virality in the Time of Twitter: A Large-Scale Cross-Party Sentiment Analysis in Greece, Spain and United Kingdom ...
|
|
|
|
BASE
|
|
Show details
|
|
89 |
L3Cube-MahaHate: A Tweet-based Marathi Hate Speech Detection Dataset and BERT models ...
|
|
|
|
Abstract:
Social media platforms are used by a large number of people prominently to express their thoughts and opinions. However, these platforms have contributed to a substantial amount of hateful and abusive content as well. Therefore, it is important to curb the spread of hate speech on these platforms. In India, Marathi is one of the most popular languages used by a wide audience. In this work, we present L3Cube-MahaHate, the first major Hate Speech Dataset in Marathi. The dataset is curated from Twitter, annotated manually. Our dataset consists of over 25000 distinct tweets labeled into four major classes i.e hate, offensive, profane, and not. We present the approaches used for collecting and annotating the data and the challenges faced during the process. Finally, we present baseline classification results using deep learning models based on CNN, LSTM, and Transformers. We explore mono-lingual and multi-lingual variants of BERT like MahaBERT, IndicBERT, mBERT, and xlm-RoBERTa and show that mono-lingual models ...
|
|
Keyword:
Computation and Language cs.CL; FOS Computer and information sciences; Machine Learning cs.LG
|
|
URL: https://dx.doi.org/10.48550/arxiv.2203.13778 https://arxiv.org/abs/2203.13778
|
|
BASE
|
|
Hide details
|
|
90 |
Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of Code-Mixed Clinical Texts ...
|
|
|
|
BASE
|
|
Show details
|
|
91 |
A Unified Strategy for Multilingual Grammatical Error Correction with Pre-trained Cross-Lingual Language Model ...
|
|
|
|
BASE
|
|
Show details
|
|
92 |
A New Generation of Perspective API: Efficient Multilingual Character-level Transformers ...
|
|
|
|
BASE
|
|
Show details
|
|
93 |
Factual Consistency of Multilingual Pretrained Language Models ...
|
|
|
|
BASE
|
|
Show details
|
|
94 |
Examining Scaling and Transfer of Language Model Architectures for Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
95 |
MuMiN: A Large-Scale Multilingual Multimodal Fact-Checked Misinformation Social Network Dataset ...
|
|
|
|
BASE
|
|
Show details
|
|
96 |
Mono vs Multilingual BERT for Hate Speech Detection and Text Classification: A Case Study in Marathi ...
|
|
|
|
BASE
|
|
Show details
|
|
100 |
From Examples to Rules: Neural Guided Rule Synthesis for Information Extraction ...
|
|
|
|
BASE
|
|
Show details
|
|
Page: 1 2 3 4 5 6 7 8 9... 690
|
|