2 |
Universal Dependencies 2.2
|
|
|
|
In: https://hal.archives-ouvertes.fr/hal-01930733 ; 2018 (2018)
|
|
BASE
|
|
Show details
|
|
6 |
English-Montenegrin parallel corpus of subtitles Opus-MontenegrinSubs 1.0
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Training corpus SETimes.SR 1.0
|
|
|
|
Abstract:
The SETimes.SR training corpus contains 86 726 tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, lemmatisation, syntactic dependencies, and named entities. The annotations (and other aspects) of the corpus are documented in the teiHeader and back element of the TEI encoded corpus. In short, they follow (1) the MULTEXT-East V5 morphosyntactic specifications, http://nl.ijs.si/ME/V5/msd/, (2) the UDv2 Guidelines, http://universaldependencies.org/guidelines.html, and (3) the Janes annotation guidelines for named entities, http://nl.ijs.si/janes/wp-content/uploads/2017/09/SlovenianNER-eng-v1.1.pdf.
|
|
Keyword:
dependency treebank; manual annotation; named entities; parsing; part-of-speech tagging; TEI; tokenisation
|
|
URL: http://hdl.handle.net/11356/1200
|
|
BASE
|
|
Hide details
|
|
10 |
Dataset and baseline model of moderated content FRENK-MMC-RTV 1.0
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Dataset and baseline model of moderated content FRENK-STYRIA-24sata 1.0
|
|
|
|
BASE
|
|
Show details
|
|
19 |
hr500k – A Reference Training Corpus of Croatian.
|
|
|
|
In: Conference papers (2018)
|
|
BASE
|
|
Show details
|
|
|
|