Home Catalogue search

eng

Refine your search:
- Keyword
- Creator / Publisher:
- Year
- Medium
- Type
- BLLDB-Access:
  - free (8)
  - subject to license (0)

Search in the Catalogues and Directories






	Sort by
Simple Search

Hits 1 – 8 of 8

1	Training corpus ssj500k 2.3
	Krek, Simon; Dobrovoljc, Kaja; Erjavec, Tomaž. - : Centre for Language Resources and Technologies, University of Ljubljana, 2021
	BASE
	Show details

2	Training corpus ssj500k 2.2
	Krek, Simon; Dobrovoljc, Kaja; Erjavec, Tomaž. - : Centre for Language Resources and Technologies, University of Ljubljana, 2019
	BASE
	Show details

3	Training corpus SETimes.SR 1.0
	Batanović, Vuk; Ljubešić, Nikola; Samardžić, Tanja. - : Regional Linguistic Data Initiative Centre ReLDI, 2018
	BASE
	Show details

4	Training corpus ssj500k 2.1
	Krek, Simon; Dobrovoljc, Kaja; Erjavec, Tomaž. - : Centre for Language Resources and Technologies, University of Ljubljana, 2018
	BASE
	Show details

5	Training corpus hr500k 1.0
	Ljubešić, Nikola; Agić, Željko; Klubička, Filip. - : Jožef Stefan Institute, 2018
	BASE
	Show details

6	Training corpus ssj500k 2.0
	Krek, Simon; Dobrovoljc, Kaja; Erjavec, Tomaž; Može, Sara; Ledinek, Nina; Holz, Nanika; Zupan, Katja; Gantar, Polona; Kuzman, Taja. - : Centre for Language Resources and Technologies, University of Ljubljana, 2017
	Abstract: The ssj500k training corpus contains about 500,000 tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, and lemmatisation. About half of the corpus is also manually annotated with syntactic dependencies, named entities, and verbal multiword expressions. The annotations of the ssj500k corpus follow (1) the MULTEXT-East V5 morphosyntactic specifications for Slovene, http://nl.ijs.si/ME/V5/msd/, (2) the JOS dependency schema, http://nl.ijs.si/jos/bib/jos-skladnja-navodila.pdf, (3) the Janes Annotation guidelines for Slovenian named entities, http://nl.ijs.si/janes/wp-content/uploads/2017/09/SlovenianNER-eng-v1.1.pdf, and the Guidelines of the PARSEME shared task on verbal multiword expressions, http://parsemefr.lif.univ-mrs.fr/parseme-st-guidelines/1.0/ The vocabulary of (1) and (2) is provided in the back element and (3) and (4) in the teiHeader of the TEI encoded corpus.
	Keyword: dependency treebank; manual annotation; named entities; parsing; tagging; TEI; tokenisation; verbal multiword expressions
	URL: http://hdl.handle.net/11356/1165
	BASE
	Hide details

7	Training corpus ssj500k 1.4
	Krek, Simon; Dobrovoljc, Kaja; Erjavec, Tomaž. - : Centre for Language Resources and Technologies, University of Ljubljana, 2016
	BASE
	Show details

8	Training corpus ssj500k 1.3
	Krek, Simon; Erjavec, Tomaž; Dobrovoljc, Kaja. - : Centre for Language Resources and Technologies, University of Ljubljana, 2015
	BASE
	Show details

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern