Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Hits 1 – 19 of 19

1	The Janes project: language resources and tools for Slovene user generated content [<Journal>]
	Fišer, Darja [Verfasser]; Ljubešić, Nikola [Sonstige]; Erjavec, Tomaž [Sonstige]
	DNB Subject Category Language
	Show details

2	Universal Dependencies 2.2
	Nivre, Joakim; Abrams, Mitchell; Agić, Željko...
	In: https://hal.archives-ouvertes.fr/hal-01930733 ; 2018 (2018)
	BASE
	Show details

3	Universal Dependencies 2.3
	Nivre, Joakim; Abrams, Mitchell; Agić, Željko. - : Universal Dependencies Consortium, 2018
	BASE
	Show details

4	Universal Dependencies 2.2
	Nivre, Joakim; Abrams, Mitchell; Agić, Željko. - : Universal Dependencies Consortium, 2018
	BASE
	Show details

5	Dictionary of Twitterese Janes-Dict 1.0
	Gantar, Polona; Škrjanec, Iza; Fišer, Darja. - : Faculty of Arts, University of Ljubljana, 2018
	BASE
	Show details

6	English-Montenegrin parallel corpus of subtitles Opus-MontenegrinSubs 1.0
	Božović, Petar; Erjavec, Tomaž; Tiedemann, Jörg; Ljubešić, Nikola; Gorjanc, Vojko. - : Jožef Stefan Institute, 2018
	Abstract: This corpus contains parallel English-Montenegrin subtitles collected in the scope of conducting a linguistic and translatological research by Petar Božović for his PhD thesis "Audiovisual Translation and Elements of Culture: A Comparative Analysis of Transfer with Reception Study in Montenegro". The data and permission to redistribute were obtained from the Radio and Television of Montenegro (http://www.rtcg.me), the public service broadcaster of Montenegro. The corpus consists of English and Montenegrin subtitles of three TV series: House of Cards (686 minutes), Damages (2878 minutes), and Tudors (1999 minutes). The corpus covers 10 seasons, 110 episodes, and 5,563 minutes in terms of duration. Sentence alignment and basic encoding were performed inside the OPUS project (http://opus.nlpl.eu/MontenegrinSubs.php), while MSD tagging, lemmatisation, and TEI conversion were performed by the CLARIN.SI infrastructure. The English texts were tagged by TreeTagger (http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/) and the Montenegrin texts by ReLDI Tagger (https://github.com/clarinsi/reldi-tagger) using the Serbian language model. The TreeTagger (Penn Treebank) tagset was mapped to the SPOOK MSD tagset for English (http://nl.ijs.si/spook/msd/html-en/msd-en.html). The corpus is available in TEI format and derived vertical format used by CQP and Manatee (Sketch Engine). The alignments in the vertical file are given separately as tables linking the alignment elements of the two languages.
	Keyword: multilingual; parallel corpus; subtitles
	URL: http://hdl.handle.net/11356/1176
	BASE
	Hide details

7	Training corpus SETimes.SR 1.0
	Batanović, Vuk; Ljubešić, Nikola; Samardžić, Tanja. - : Regional Linguistic Data Initiative Centre ReLDI, 2018
	BASE
	Show details

8	Spoken corpus Gos VideoLectures 3.0 (transcription)
	Verdonik, Darinka; Potočnik, Tomaž; Sepesy Maučec, Mirjam. - : Faculty of Electrical Engineering and Computer Science, University of Maribor, 2018
	BASE
	Show details

9	Automatically constructed multiword lexicon slMWELex v0.5
	Ljubešić, Nikola; Krek, Simon; Dobrovoljc, Kaja. - : Jožef Stefan Institute, 2018
	BASE
	Show details

10	Dataset and baseline model of moderated content FRENK-MMC-RTV 1.0
	Ljubešić, Nikola; Erjavec, Tomaž; Fišer, Darja. - : Jožef Stefan Institute, 2018
	BASE
	Show details

11	JRC EU DGT Translation Memory Parsebank DGT-UD 1.0
	Ljubešić, Nikola; Erjavec, Tomaž. - : Jožef Stefan Institute, 2018
	BASE
	Show details

12	Training corpus ssj500k 2.1
	Krek, Simon; Dobrovoljc, Kaja; Erjavec, Tomaž. - : Centre for Language Resources and Technologies, University of Ljubljana, 2018
	BASE
	Show details

13	Word embeddings CLARIN.SI-embed.sl 1.0
	Ljubešić, Nikola; Erjavec, Tomaž. - : Jožef Stefan Institute, 2018
	BASE
	Show details

14	Bilingual terminology extraction dataset KAS-biterm 1.0
	Erjavec, Tomaž; Fišer, Darja; Ljubešić, Nikola. - : Jožef Stefan Institute, 2018
	BASE
	Show details

15	Terminology identification dataset KAS-term 1.0
	Erjavec, Tomaž; Fišer, Darja; Ljubešić, Nikola. - : Jožef Stefan Institute, 2018
	BASE
	Show details

16	Croatian language corpus Riznica 0.1
	Brozović Rončević, Dunja; Ćavar, Damir; Ćavar, Małgorzata. - : Institute of Croatian Language and Linguistics, 2018
	BASE
	Show details

17	Training corpus hr500k 1.0
	Ljubešić, Nikola; Agić, Željko; Klubička, Filip. - : Jožef Stefan Institute, 2018
	BASE
	Show details

18	Dataset and baseline model of moderated content FRENK-STYRIA-24sata 1.0
	Ljubešić, Nikola; Erjavec, Tomaž; Fišer, Darja. - : Jožef Stefan Institute, 2018
	BASE
	Show details

19	hr500k – A Reference Training Corpus of Croatian.
	Erjavec, Tomaž; Ljubešić, Nikola; Klubicka, Filip...
	In: Conference papers (2018)
	BASE
	Show details

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern