8 |
The ParlaMint corpora of parliamentary proceedings
|
|
|
|
In: Lang Resour Eval (2022)
|
|
BASE
|
|
Show details
|
|
15 |
Offensive language dataset of Croatian, English and Slovenian comments FRENK 1.0
|
|
|
|
BASE
|
|
Show details
|
|
17 |
Comparable corpora of South-Slavic Wikipedias CLASSLA-Wikipedia 1.0
|
|
|
|
Abstract:
This comparable corpus collection consists of Wikipedia dumps of the Bosnian, Croatian, Macedonian, Montenegrin, Serbian, Serbo-Croatian and Slovenian Wikipedia, harvested on October 17th 2020. The text was extracted from the dumps with the process documented at https://github.com/clarinsi/classla-wikipedia, and linguistic annotation was performed with the classla package (https://pypi.org/project/classla/), on all levels available for a specific language, with the Bosnian and Serbo-Croatian Wikipedias processed with the standard Croatian models.
|
|
Keyword:
comparable corpus; Wikipedia
|
|
URL: http://hdl.handle.net/11356/1427
|
|
BASE
|
|
Hide details
|
|
20 |
Multilingual comparable corpora of parliamentary debates ParlaMint 2.1
|
|
|
|
BASE
|
|
Show details
|
|
|
|