DE eng

Search in the Catalogues and Directories

Hits 1 – 10 of 10

1
EMBEDDIA tools output example corpus of Estonian, Croatian and Latvian news articles 1.0
Freienthal, Linda; Pelicon, Andraž; Martinc, Matej. - : Ekspress Meedia Group, 2022. : Styria Media Group, 2022
BASE
Show details
2
Retweet communities reveal the main sources of hate speech
In: PLoS One (2022)
BASE
Show details
3
Slovenian Twitter dataset 2018-2020 1.0
Evkoski, Bojan; Pelicon, Andraž; Mozetič, Igor. - : Jožef Stefan Institute, 2021
BASE
Show details
4
Italian YouTube Hate Speech Corpus
Cinelli, Matteo; Pelicon, Andraž; Mozetič, Igor. - : Jožef Stefan Institute, 2021
BASE
Show details
5
Latvian user comment dataset 1.0
Shekhar, Ravi; Purver, Matthew; Pollak, Senja. - : Ekspress Meedia Group, 2021
BASE
Show details
6
Ekspress user comment dataset 1.0
Shekhar, Ravi; Pollak, Senja; Pelicon, Andraž. - : Ekspress Meedia Group, 2021
BASE
Show details
7
24sata news comment dataset 1.0
Shekhar, Ravi; Pranjic, Marko; Pollak, Senja. - : Styria Media Group, 2021
BASE
Show details
8
SimLex-999 Slovenian translation SimLex-999-sl 1.0
Pollak, Senja; Vulić, Ivan; Pelicon, Andraž. - : University of Ljubljana, 2021
BASE
Show details
9
Investigating cross-lingual training for offensive language detection
In: PeerJ Comput Sci (2021)
Abstract: Platforms that feature user-generated content (social media, online forums, newspaper comment sections etc.) have to detect and filter offensive speech within large, fast-changing datasets. While many automatic methods have been proposed and achieve good accuracies, most of these focus on the English language, and are hard to apply directly to languages in which few labeled datasets exist. Recent work has therefore investigated the use of cross-lingual transfer learning to solve this problem, training a model in a well-resourced language and transferring to a less-resourced target language; but performance has so far been significantly less impressive. In this paper, we investigate the reasons for this performance drop, via a systematic comparison of pre-trained models and intermediate training regimes on five different languages. We show that using a better pre-trained language model results in a large gain in overall performance and in zero-shot transfer, and that intermediate training on other languages is effective when little target-language data is available. We then use multiple analyses of classifier confidence and language model vocabulary to shed light on exactly where these gains come from and gain insight into the sources of the most typical mistakes.
Keyword: Computational Linguistics
URL: https://doi.org/10.7717/peerj-cs.559
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8237322/
BASE
Hide details
10
Sentiment Annotated Dataset of Croatian News
Pelicon, Andraž; Pranjić, Marko; Miljković, Dragana. - : Jožef Stefan Institute, 2020
BASE
Show details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
10
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern