13 |
Training corpus ssj500k 1.3
|
|
|
|
Abstract:
The ssj500k training corpus is based on two training corpora built within the JOS project (http://nl.ijs.si/jos/). It contains the jos100k corpus and additional material from the jos1M corpus forming a training corpus with 500,000 words, manually checked and annotated on the levels of tokenization, segmentation, morphosyntactic tagging, syntactic dependency parsing and named entities. The ssj500k corpus uses the JOS morphosyntactic tagset with 1,902 tags and dependencies with 10 labels. The part of the corpus annotated with dependency relations contains 11,411 sentences, named entities are annotated in the original jos100k part of the corpus.
|
|
Keyword:
dependency treebank; manual annotation; named entities; parsing; tagging; TEI; tokenisation
|
|
URL: http://hdl.handle.net/11356/1029
|
|
BASE
|
|
Hide details
|
|
17 |
Main results of MONDILEX project
|
|
|
|
In: Cognitive Studies | Études cognitives; No 11 (2011); 265-290 ; 2392-2397 (2015)
|
|
BASE
|
|
Show details
|
|
18 |
MONDILEX – towards the research infrastructure for digital resources in Slavic lexicography
|
|
|
|
In: Cognitive Studies | Études cognitives; No 10 (2010); 147-162 ; 2392-2397 (2015)
|
|
BASE
|
|
Show details
|
|
19 |
The Japanese-Slovene dictionary jaSlo: its development, enhancement and use
|
|
|
|
In: Cognitive Studies | Études cognitives; No 10 (2010); 203-216 ; 2392-2397 (2015)
|
|
BASE
|
|
Show details
|
|
|
|