Home Catalogue search

eng

Refine your search:
- Keyword
- Creator / Publisher:
- Year
- Medium:
  - Online (9)
- Type
- BLLDB-Access:
  - free (9)
  - subject to license (0)

Search in the Catalogues and Directories






	Sort by
Simple Search

Hits 1 – 9 of 9

1	Abstracts from the KAS corpus KAS-Abs 2.0
	Žagar, Aleš; Kavaš, Matic; Robnik-Šikonja, Marko. - : Faculty of Electrical Engineering and Computer Science, University of Maribor, 2022. : Faculty of Computer and Information Science, University of Ljubljana, 2022
	BASE
	Show details

2	Corpus of academic Slovene KAS 2.0
	Žagar, Aleš; Kavaš, Matic; Robnik-Šikonja, Marko. - : Faculty of Electrical Engineering and Computer Science, University of Maribor, 2022. : Faculty of Computer and Information Science, University of Ljubljana, 2022
	BASE
	Show details

3	Summarization datasets from the KAS corpus KAS-Sum 1.0
	Žagar, Aleš; Kavaš, Matic; Robnik-Šikonja, Marko. - : Faculty of Electrical Engineering and Computer Science, University of Maribor, 2022. : Faculty of Computer and Information Science, University of Ljubljana, 2022
	BASE
	Show details

4	Machine Translation datasets from the KAS corpus KAS-MT 1.0
	Žagar, Aleš; Kavaš, Matic; Robnik-Šikonja, Marko; Erjavec, Tomaž; Fišer, Darja; Ljubešić, Nikola; Ferme, Marko; Borovič, Mladen; Boškovič, Borko; Ojsteršek, Milan; Hrovat, Goran. - : Faculty of Electrical Engineering and Computer Science, University of Maribor, 2022. : Faculty of Computer and Information Science, University of Ljubljana, 2022
	Abstract: The Machine Translation datasets KAS-MT 1.0 contain automatically sentence-aligned Slovene and English plain-text abstracts from KAS-Abs 2.0 (http://hdl.handle.net/11356/1449) and is meant for studies in machine translation. The setence alignment approach used requires an alignment reliability threshold that omits candidate pairs below a certain value. This value represents a trade-off between the quantity and quality of aligned pairs. We estimate that the default threshold value produces a good-quality dataset for most users. We release three such datasets (files) that reflect a trade-off between quality and quantity of the data. The Normal dataset uses the default reliability threshold and contains 496,102 sentence pairs, the Strict dataset 474,852 sentence pairs, and the Very Strict dataset 425,534 sentence pairs. A file with thesis metadata is also included. The first column in each of the three TSV files gives the confidence that the alignment is correct (higher is better), the second and third are the source and target Slovene and English sentences, while the fourth gives the “merged” state, i.e. whether sentences in the source or target language were merged (sentences do not always exhibit one-to-one mapping). The last column gives the thesis ID. Reference: Žagar, A., Kavaš, M., & Robnik Šikonja, M. (2021). Corpus KAS 2.0: cleaner and with new datasets. In Information Society - IS 2021: Proceedings of the 24th International Multiconference. https://doi.org/10.5281/zenodo.5562228
	Keyword: academic writing; BSc/BA theses; machine translation; MSc/MA theses; PhD theses
	URL: http://hdl.handle.net/11356/1447
	BASE
	Hide details

5	Slovene SuperGLUE Benchmark: Translation and Evaluation ...
	Žagar, Aleš; Robnik-Šikonja, Marko. - : arXiv, 2022
	BASE
	Show details

6	Unsupervised Approach to Cross-Lingual User Comments Summarization ...
	Žagar, Aleš; Robnik-Šikonja, Marko. - : Zenodo, 2021
	BASE
	Show details

7	Unsupervised Approach to Cross-Lingual User Comments Summarization ...
	Žagar, Aleš; Robnik-Šikonja, Marko. - : Zenodo, 2021
	BASE
	Show details

8	Evaluation of contextual embeddings on less-resourced languages ...
	Ulčar, Matej; Žagar, Aleš; Armendariz, Carlos S.. - : arXiv, 2021
	BASE
	Show details

9	Slovene translation of SuperGLUE
	Žagar, Aleš; Robnik-Šikonja, Marko; Goli, Teja. - : Faculty of Computer and Information Science, University of Ljubljana, 2020
	BASE
	Show details

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern