61 |
ŠUSS archive of questions and answers about the Slovenian language (1998-2010)
|
|
|
|
BASE
|
|
Show details
|
|
65 |
Parla-CLARIN: TEI guidelines for corpora of parliamentary proceedings ...
|
|
|
|
BASE
|
|
Show details
|
|
66 |
Parla-CLARIN: TEI guidelines for corpora of parliamentary proceedings ...
|
|
|
|
BASE
|
|
Show details
|
|
67 |
A corpus-based study of 16th-century Slovene clitics and clitic-like elements
|
|
|
|
BASE
|
|
Show details
|
|
69 |
Universal Dependencies 2.2
|
|
|
|
In: https://hal.archives-ouvertes.fr/hal-01930733 ; 2018 (2018)
|
|
BASE
|
|
Show details
|
|
73 |
English-Montenegrin parallel corpus of subtitles Opus-MontenegrinSubs 1.0
|
|
|
|
BASE
|
|
Show details
|
|
77 |
Dataset and baseline model of moderated content FRENK-MMC-RTV 1.0
|
|
|
|
Abstract:
FRENK-MMC-RTV is a dataset of moderated newspaper comments from the website rtvslo.si with metadata on the time of publishing, user identifier, thread identifier and whether the comment was deleted by the moderators or not. The full text of each comment is encrypted via a character-replacement method so that the comments are not readable by humans. Basic punctuation is not encrypted in order to enable tokenization. The main use of this dataset are experiments on automating comment moderation. For real-world usage, a fastText classification model trained on non-encrypted data is made available as well.
|
|
Keyword:
computer-mediated communication; content moderation; news comments
|
|
URL: http://hdl.handle.net/11356/1201
|
|
BASE
|
|
Hide details
|
|
|
|