Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Hits 1 – 8 of 8

1	Frequency lists of character-level n-grams from the GOS 1.0 corpus 1.1
	Čibej, Jaka; Arhar Holdt, Špela; Dobrovoljc, Kaja. - : Centre for Language Resources and Technologies, University of Ljubljana, 2020. : Jožef Stefan Institute, 2020
	BASE
	Show details

2	Frequency lists of words from the GOS 1.0 corpus 1.1
	Čibej, Jaka; Arhar Holdt, Špela; Dobrovoljc, Kaja. - : Centre for Language Resources and Technologies, University of Ljubljana, 2020. : Jožef Stefan Institute, 2020
	BASE
	Show details

3	Frequency lists of word-level n-grams from the GOS 1.0 corpus 1.1
	Čibej, Jaka; Arhar Holdt, Špela; Dobrovoljc, Kaja. - : Centre for Language Resources and Technologies, University of Ljubljana, 2020. : Jožef Stefan Institute, 2020
	BASE
	Show details

4	Frequency lists of word parts from the GOS 1.0 corpus 1.1
	Čibej, Jaka; Arhar Holdt, Špela; Dobrovoljc, Kaja. - : Centre for Language Resources and Technologies, University of Ljubljana, 2020. : Jožef Stefan Institute, 2020
	BASE
	Show details

5	Frequency lists of words from the GOS 1.0 corpus
	Čibej, Jaka; Arhar Holdt, Špela; Dobrovoljc, Kaja; Krek, Simon. - : Centre for Language Resources and Technologies, University of Ljubljana, 2019. : Jožef Stefan Institute, 2019
	Abstract: Frequency lists of words were extracted from the GOS 1.0 Corpus of Spoken Slovene (http://hdl.handle.net/11356/1040) using the LIST corpus extraction tool (http://hdl.handle.net/11356/1227). The lists contain all words occurring in the corpus along with their absolute and relative frequencies, percentages, and distribution across the text-types included in the corpus taxonomy. The lists were extracted for each part-of-speech category. For each part-of-speech, two lists were extracted: 1) one containing lemmas and their text-type distribution, 2) one containing lower-case word forms as well as their normalized forms, lemmas, and morphosyntactic tags along with their text-type distribution. In addition, four lists were extracted from all words (regardless of their part-of-speech category): 1) a list of all lemmas along with their part-of-speech category and text-type distribution; 2) a list of all lower-case word forms with their lemmas, part-of-speech categories, and text-type distribution; 3) a list of all lower-case word forms with their normalized word forms, lemmas, part-of-speech categories, and text-type distribution; 4) a list of all morphosyntactic tags and their text-type distribution (the tags are also split into several columns).
	Keyword: frequency list; lemmas; normalized forms; Slovenian language; spoken corpus; words
	URL: http://hdl.handle.net/11356/1269
	BASE
	Hide details

6	Frequency lists of word parts from the GOS 1.0 corpus
	Čibej, Jaka; Arhar Holdt, Špela; Dobrovoljc, Kaja. - : Centre for Language Resources and Technologies, University of Ljubljana, 2019. : Jožef Stefan Institute, 2019
	BASE
	Show details

7	Frequency lists of word-level n-grams from the GOS 1.0 corpus
	Čibej, Jaka; Arhar Holdt, Špela; Dobrovoljc, Kaja. - : Centre for Language Resources and Technologies, University of Ljubljana, 2019. : Jožef Stefan Institute, 2019
	BASE
	Show details

8	Frequency lists of character-level n-grams from the GOS 1.0 corpus
	Čibej, Jaka; Arhar Holdt, Špela; Dobrovoljc, Kaja. - : Centre for Language Resources and Technologies, University of Ljubljana, 2019. : Jožef Stefan Institute, 2019
	BASE
	Show details

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern