Home Catalogue search

eng

Refine your search:
- Keyword
- Creator / Publisher:
- Year
- Medium
- Type
- BLLDB-Access:
  - free (29)
  - subject to license (0)

Search in the Catalogues and Directories






	Sort by
Simple Search

Page: 1 2

Hits 1 – 20 of 29

1	Training corpus ssj500k 2.3
	Krek, Simon; Dobrovoljc, Kaja; Erjavec, Tomaž. - : Centre for Language Resources and Technologies, University of Ljubljana, 2021
	BASE
	Show details

2	Training corpus ssj500k 2.2
	Krek, Simon; Dobrovoljc, Kaja; Erjavec, Tomaž. - : Centre for Language Resources and Technologies, University of Ljubljana, 2019
	BASE
	Show details

3	Croatian Twitter training corpus ReLDI-NormTagNER-hr 2.1
	Ljubešić, Nikola; Erjavec, Tomaž; Batanović, Vuk. - : Jožef Stefan Institute, 2019
	BASE
	Show details

4	CMC training corpus Janes-Tag 2.1
	Erjavec, Tomaž; Fišer, Darja; Čibej, Jaka; Arhar Holdt, Špela; Ljubešić, Nikola; Zupan, Katja; Dobrovoljc, Kaja. - : Jožef Stefan Institute, 2019
	Abstract: Janes-Tag is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is meant as a gold-standard training and testing dataset for tokenisation, sentence segmentation, word normalisation, morphosyntactic tagging, lemmatisation and named entity annotation of non-standard Slovene. As the corpus has been carefully manually annotated, it is also suitable for detailed linguistic explorations which require highly accurate and reliable annotations. As an update to version 2.0, this version corrects some minor errors in NER annotation and introduces, in addition to MULTEXT-East morphosyntactic descriptions, also Universal Dependencies morphological features and the corpus in CoNLL-U format. The UD features are also included in the vert file. The first version of this corpus is described in: ERJAVEC, Tomaž, ČIBEJ, Jaka, ARHAR HOLDT, Špela, LJUBEŠIĆ, Nikola, FIŠER, Darja. 2016. Gold-standard datasets for annotation of Slovene computer-mediated communication. In Proceedings of RASLAN 2016: Recent Advances in Slavonic Natural Language Processing. Brno: Tribun EU, 2016, pp. 29-40, https://nlp.fi.muni.cz/raslan/raslan16.pdf FIŠER, Darja, LJUBEŠIĆ, Nikola, ERJAVEC, Tomaž. 2018. The Janes project: language resources and tools for Slovene user generated content. Language Resources & Evaluation. https://rdcu.be/7RX4 Note that a related corpus, Janes-Norm is also available, cf. http://hdl.handle.net/11356/1084.
	Keyword: computer-mediated communication; lemmatisation; manual annotation; named entities; part-of-speech tagging; TEI; tokenisation; word normalisation
	URL: http://hdl.handle.net/11356/1238
	BASE
	Hide details

5	Serbian Twitter training corpus ReLDI-NormTagNER-sr 2.1
	Ljubešić, Nikola; Erjavec, Tomaž; Batanović, Vuk. - : Jožef Stefan Institute, 2019
	BASE
	Show details

6	Tokeniser for the Alsatian Dialects ...
	Bernhard, Delphine. - : Zenodo, 2018
	BASE
	Show details

7	Tokeniser for the Alsatian Dialects ...
	Bernhard, Delphine. - : Zenodo, 2018
	BASE
	Show details

8	Tokeniser For The Alsatian Dialects ...
	Bernhard, Delphine. - : Zenodo, 2018
	BASE
	Show details

9	Training corpus SETimes.SR 1.0
	Batanović, Vuk; Ljubešić, Nikola; Samardžić, Tanja. - : Regional Linguistic Data Initiative Centre ReLDI, 2018
	BASE
	Show details

10	Training corpus ssj500k 2.1
	Krek, Simon; Dobrovoljc, Kaja; Erjavec, Tomaž. - : Centre for Language Resources and Technologies, University of Ljubljana, 2018
	BASE
	Show details

11	Training corpus hr500k 1.0
	Ljubešić, Nikola; Agić, Željko; Klubička, Filip. - : Jožef Stefan Institute, 2018
	BASE
	Show details

12	CMC training corpus Janes-Tag 2.0
	Erjavec, Tomaž; Fišer, Darja; Čibej, Jaka. - : Jožef Stefan Institute, 2017
	BASE
	Show details

13	Croatian Twitter training corpus ReLDI-NormTag-hr 1.1
	Ljubešić, Nikola; Farkaš, Daša; Klubička, Filip. - : Jožef Stefan Institute, 2017
	BASE
	Show details

14	Serbian Twitter training corpus ReLDI-NormTag-sr 1.0
	Ljubešić, Nikola; Farkaš, Daša; Klubička, Filip. - : Jožef Stefan Institute, 2017
	BASE
	Show details

15	Croatian Twitter training corpus ReLDI-NormTag-hr 1.0
	Ljubešić, Nikola; Farkaš, Daša; Klubička, Filip. - : Jožef Stefan Institute, 2017
	BASE
	Show details

16	Serbian Twitter training corpus ReLDI-NormTagNER-sr 2.0
	Ljubešić, Nikola; Erjavec, Tomaž; Miličević, Maja. - : Jožef Stefan Institute, 2017
	BASE
	Show details

17	Training corpus ssj500k 2.0
	Krek, Simon; Dobrovoljc, Kaja; Erjavec, Tomaž. - : Centre for Language Resources and Technologies, University of Ljubljana, 2017
	BASE
	Show details

18	CMC training corpus Janes-Syn 1.0
	Arhar Holdt, Špela; Erjavec, Tomaž; Fišer, Darja. - : Jožef Stefan Institute, 2017
	BASE
	Show details

19	Serbian Twitter training corpus ReLDI-NormTag-sr 1.1
	Ljubešić, Nikola; Farkaš, Daša; Klubička, Filip. - : Jožef Stefan Institute, 2017
	BASE
	Show details

20	Croatian Twitter training corpus ReLDI-NormTagNER-hr 2.0
	Ljubešić, Nikola; Erjavec, Tomaž; Miličević, Maja. - : Jožef Stefan Institute, 2017
	BASE
	Show details

Page: 1 2

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern