DE eng

Search in the Catalogues and Directories

Hits 1 – 8 of 8

1
Frequency lists of character-level n-grams from the GOS 1.0 corpus 1.1
Čibej, Jaka; Arhar Holdt, Špela; Dobrovoljc, Kaja. - : Centre for Language Resources and Technologies, University of Ljubljana, 2020. : Jožef Stefan Institute, 2020
BASE
Show details
2
Frequency lists of words from the GOS 1.0 corpus 1.1
Čibej, Jaka; Arhar Holdt, Špela; Dobrovoljc, Kaja. - : Centre for Language Resources and Technologies, University of Ljubljana, 2020. : Jožef Stefan Institute, 2020
BASE
Show details
3
Frequency lists of word-level n-grams from the GOS 1.0 corpus 1.1
Čibej, Jaka; Arhar Holdt, Špela; Dobrovoljc, Kaja; Krek, Simon. - : Centre for Language Resources and Technologies, University of Ljubljana, 2020. : Jožef Stefan Institute, 2020
Abstract: Frequency lists of word-level n-grams (or word sets) were extracted from the GOS 1.0 Corpus of Spoken Slovene (http://hdl.handle.net/11356/1040) using the LIST corpus extraction tool (http://hdl.handle.net/11356/1227). The lists contain all word-level 2-, 3-, 4- and 5-grams occurring in the corpus along with their absolute and relative frequencies, percentages, distribution across the text-types included in the corpus taxonomy, and five collocation measures: Dice, t-score, MI, MI3, logDice, and simple LL. The n-grams were extracted from lower-case word forms, standardized word forms, and morphosyntactic tags. For large lists, shortened versions with the first 150,000 lines were also prepared to facilitate further processing in spreadsheet analysis software. Compared to the previous version (http://hdl.handle.net/11356/1271), this one includes fixes of several typos and substitutes all instances of "normalized forms" with the more adequate term "standardized forms" (as used in the SSJ project).
Keyword: morphosyntactic tags; n-grams; Slovenian language; spoken corpus; standardized forms; word forms; word sets; words
URL: http://hdl.handle.net/11356/1365
BASE
Hide details
4
Frequency lists of word parts from the GOS 1.0 corpus 1.1
Čibej, Jaka; Arhar Holdt, Špela; Dobrovoljc, Kaja. - : Centre for Language Resources and Technologies, University of Ljubljana, 2020. : Jožef Stefan Institute, 2020
BASE
Show details
5
Frequency lists of words from the GOS 1.0 corpus
Čibej, Jaka; Arhar Holdt, Špela; Dobrovoljc, Kaja. - : Centre for Language Resources and Technologies, University of Ljubljana, 2019. : Jožef Stefan Institute, 2019
BASE
Show details
6
Frequency lists of word parts from the GOS 1.0 corpus
Čibej, Jaka; Arhar Holdt, Špela; Dobrovoljc, Kaja. - : Centre for Language Resources and Technologies, University of Ljubljana, 2019. : Jožef Stefan Institute, 2019
BASE
Show details
7
Frequency lists of word-level n-grams from the GOS 1.0 corpus
Čibej, Jaka; Arhar Holdt, Špela; Dobrovoljc, Kaja. - : Centre for Language Resources and Technologies, University of Ljubljana, 2019. : Jožef Stefan Institute, 2019
BASE
Show details
8
Frequency lists of character-level n-grams from the GOS 1.0 corpus
Čibej, Jaka; Arhar Holdt, Špela; Dobrovoljc, Kaja. - : Centre for Language Resources and Technologies, University of Ljubljana, 2019. : Jožef Stefan Institute, 2019
BASE
Show details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
8
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern