7 |
Offensive language dataset of Croatian, English and Slovenian comments FRENK 1.0
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Comparable corpora of South-Slavic Wikipedias CLASSLA-Wikipedia 1.0
|
|
|
|
Abstract:
This comparable corpus collection consists of Wikipedia dumps of the Bosnian, Croatian, Macedonian, Montenegrin, Serbian, Serbo-Croatian and Slovenian Wikipedia, harvested on October 17th 2020. The text was extracted from the dumps with the process documented at https://github.com/clarinsi/classla-wikipedia, and linguistic annotation was performed with the classla package (https://pypi.org/project/classla/), on all levels available for a specific language, with the Bosnian and Serbo-Croatian Wikipedias processed with the standard Croatian models.
|
|
Keyword:
comparable corpus; Wikipedia
|
|
URL: http://hdl.handle.net/11356/1427
|
|
BASE
|
|
Hide details
|
|
12 |
Multilingual comparable corpora of parliamentary debates ParlaMint 2.1
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Offensive language dataset of Croatian, English and Slovenian comments FRENK 1.1
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Linguistically annotated multilingual comparable corpora of parliamentary debates ParlaMint.ana 2.1
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Linguistically annotated multilingual comparable corpora of parliamentary debates ParlaMint.ana 2.0
|
|
|
|
BASE
|
|
Show details
|
|
|
|