Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Page: 1...3 4 5 6 7 8 9

Hits 121 – 140 of 174

121	Universal Dependencies 1.4
	Nivre, Joakim; Agić, Željko; Ahrenberg, Lars. - : Universal Dependencies Consortium, 2016
	BASE
	Show details

122	Universal Dependencies 1.3
	Nivre, Joakim; Agić, Željko; Ahrenberg, Lars. - : Universal Dependencies Consortium, 2016
	BASE
	Show details

123	Training corpus ssj500k 1.4
	Krek, Simon; Dobrovoljc, Kaja; Erjavec, Tomaž. - : Centre for Language Resources and Technologies, University of Ljubljana, 2016
	BASE
	Show details

124	Slovenian parliamentary corpus SlovParl 1.0
	Pančur, Andrej; Šorn, Mojca; Erjavec, Tomaž. - : Institute of Contemporary History, 2016
	BASE
	Show details

125	Spoken corpus Gos VideoLectures 1.0 (transcription)
	Verdonik, Darinka; Potočnik, Tomaž; Sepesy Maučec, Mirjam. - : Faculty of Electrical Engineering and Computer Science, University of Maribor, 2016
	BASE
	Show details

126	Dataset of normalised Slovene text KonvNormSl 1.0
	Ljubešić, Nikola; Zupan, Katja; Fišer, Darja. - : Jožef Stefan Institute, 2016
	BASE
	Show details

127	CMC training corpus Janes-Tag 1.2
	Erjavec, Tomaž; Fišer, Darja; Čibej, Jaka. - : Jožef Stefan Institute, 2016
	BASE
	Show details

128	Japanese-Slovene learner's dictionary jaSlo 3.1
	Hmeljak, Kristina; Erjavec, Tomaž; Srdanović, Irena. - : Faculty of Arts, University of Ljubljana, 2016
	BASE
	Show details

129	CMC training corpus Janes-Norm 1.2
	Erjavec, Tomaž; Fišer, Darja; Čibej, Jaka. - : Jožef Stefan Institute, 2016
	BASE
	Show details

130	Overview of Annotation Creation: Processes & Tools ...
	Finlayson, Mark A.; Erjavec, Tomaž. - : arXiv, 2016
	BASE
	Show details

131	Modernising historical Slovene words
	Scherrer, Yves; Erjavec, Tomaž
	In: ISSN: 1351-3249 ; Natural Language Engineering, Vol. 22, No 6 (2016) pp. 881-905 (2016)
	Abstract: We propose a language-independent word normalisation method and exemplify it on modernising historical Slovene words. Our method relies on character-level statistical machine translation (CSMT) and uses only shallow knowledge. We present relevant data on historical Slovene, consisting of two (partially) manually annotated corpora and the lexicons derived from these corpora, containing historical word–modern word pairs. The two lexicons are disjoint, with one serving as the training set containing 40,000 entries, and the other as a test set with 20,000 entries. The data spans the years 1750–1900, and the lexicons are split into fifty-year slices, with all the experiments carried out separately on the three time periods. We perform two sets of experiments. In the first one – a supervised setting – we build a CSMT system using the lexicon of word pairs as training data. In the second one – an unsupervised setting – we simulate a scenario in which word pairs are not available. We propose a two-step method where we first extract a noisy list of word pairs by matching historical words with cognate modern words, and then train a CSMT system on these pairs. In both sets of experiments, we also optionally make use of a lexicon of modern words to filter the modernisation hypotheses. While we show that both methods produce significantly better results than the baselines, their accuracy and which method works best strongly correlates with the age of the texts, meaning that the choice of the best method will depend on the properties of the historical language which is to be modernised. As an extrinsic evaluation, we also compare the quality of part-of-speech tagging and lemmatisation directly on historical text and on its modernised words. We show that, depending on the age of the text, annotation on modernised words also produces significantly better results than annotation on the original text.
	Keyword: info:eu-repo/classification/ddc/410
	URL: https://archive-ouverte.unige.ch/unige:82305
	BASE
	Hide details

132	Universal Dependencies 1.2
	Nivre, Joakim; Agić, Željko; Aranzabe, Maria Jesus. - : Universal Dependencies Consortium, 2015
	BASE
	Show details

133	Morphological lexicon Sloleks 1.0
	Dobrovoljc, Kaja; Krek, Simon; Holozan, Peter. - : Centre for Language Resources and Technologies, University of Ljubljana, 2015
	BASE
	Show details

134	Spoken corpus Gos 1.0
	Zwitter Vitez, Ana; Zemljarič Miklavčič, Jana; Krek, Simon. - : Centre for Language Resources and Technologies, University of Ljubljana, 2015
	BASE
	Show details

135	Written corpus ccGigafida 1.0
	Logar, Nataša; Erjavec, Tomaž; Krek, Simon. - : Centre for Language Resources and Technologies, University of Ljubljana, 2015
	BASE
	Show details

136	MULTEXT-East free lexicons 4.0
	Erjavec, Tomaž; Bruda, Ştefan; Derzhanski, Ivan. - : Jožef Stefan Institute, 2015
	BASE
	Show details

137	Training corpus jos1M 1.1
	Erjavec, Tomaž; Krek, Simon. - : Jožef Stefan Institute, 2015
	BASE
	Show details

138	Reference corpus of historical Slovene goo300k 1.2
	Erjavec, Tomaž. - : Jožef Stefan Institute, 2015
	BASE
	Show details

139	Morphological lexicon Sloleks 1.2
	Dobrovoljc, Kaja; Krek, Simon; Holozan, Peter. - : Centre for Language Resources and Technologies, University of Ljubljana, 2015
	BASE
	Show details

140	MULTEXT-East "1984" document corpus 4.0
	Erjavec, Tomaž; Bruda, Ştefan; Dimitrova, Ludmila. - : Jožef Stefan Institute, 2015
	BASE
	Show details

Page: 1...3 4 5 6 7 8 9

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern