Home Catalogue search

eng

Refine your search:
- Keyword:
- Creator / Publisher
- Year
- Medium:
  - Online (10)
- Type
- BLLDB-Access:
  - free (10)
  - subject to license (0)

Search in the Catalogues and Directories






	Sort by
Simple Search

Hits 1 – 10 of 10

1	Training corpus ssj500k 2.3
	Krek, Simon; Dobrovoljc, Kaja; Erjavec, Tomaž. - : Centre for Language Resources and Technologies, University of Ljubljana, 2021
	BASE
	Show details

2	Training corpus ssj500k 2.2
	Krek, Simon; Dobrovoljc, Kaja; Erjavec, Tomaž. - : Centre for Language Resources and Technologies, University of Ljubljana, 2019
	BASE
	Show details

3	Croatian Twitter training corpus ReLDI-NormTagNER-hr 2.1
	Ljubešić, Nikola; Erjavec, Tomaž; Batanović, Vuk. - : Jožef Stefan Institute, 2019
	BASE
	Show details

4	CMC training corpus Janes-Tag 2.1
	Erjavec, Tomaž; Fišer, Darja; Čibej, Jaka. - : Jožef Stefan Institute, 2019
	BASE
	Show details

5	Serbian Twitter training corpus ReLDI-NormTagNER-sr 2.1
	Ljubešić, Nikola; Erjavec, Tomaž; Batanović, Vuk. - : Jožef Stefan Institute, 2019
	BASE
	Show details

6	Training corpus jos1M 1.2
	Erjavec, Tomaž; Krek, Simon; Dobrovoljc, Kaja. - : Jožef Stefan Institute, 2019
	BASE
	Show details

7	Training corpus SETimes.SR 1.0
	Batanović, Vuk; Ljubešić, Nikola; Samardžić, Tanja. - : Regional Linguistic Data Initiative Centre ReLDI, 2018
	BASE
	Show details

8	Training corpus hr500k 1.0
	Ljubešić, Nikola; Agić, Željko; Klubička, Filip. - : Jožef Stefan Institute, 2018
	BASE
	Show details

9	Reference corpus of historical Slovene goo300k 1.2
	Erjavec, Tomaž. - : Jožef Stefan Institute, 2015
	Abstract: goo300k is a manually annotated reference corpus of historical Slovene. It contains 1,100 pages (about 300,000 tokens) sampled from 89 texts from the period 1584-1899. Each text contains extensive meta-data and per-page links to facsimiles, while the word tokens in the texts are annotated with their modernised word-form, lemma, part-of-speech, and, for archaic words, their nearest modern synonyms or short explanation. The corpus is available in source TEI P5 XML and in the simpler and smaller vertical format, used by various concordancers. Note that the vertical format does not contain all the information from the source TEI.
	Keyword: historical language; lemmatisation; manual annotation; part-of-speech tagging; TEI; word modernisation
	URL: http://hdl.handle.net/11356/1025
	BASE
	Hide details

10	MULTEXT-East "1984" annotated corpus 4.0
	Erjavec, Tomaž; Barbu, Ana-Maria; Derzhanski, Ivan. - : Jožef Stefan Institute, 2015
	BASE
	Show details

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern