44 |
Frequency lists of word parts from the GOS 1.0 corpus
|
|
|
|
Abstract:
Frequency lists of words split into word parts were extracted from the GOS 1.0 Corpus of Spoken Slovene (http://hdl.handle.net/11356/1040) using the LIST corpus extraction tool (http://hdl.handle.net/11356/1227). The lists contain all lemmas, lower-case word forms or normalized word forms occurring in the corpus, split into their initial or final part (i.e. the initial or final string of 1, 2, 3, 4 or 5 characters in the word) and the rest. In addition, the lists also contain absolute and relative frequencies, percentages, and distribution across the text-types included in the corpus taxonomy. The lists were extracted for each part-of-speech category. For each part-of-speech, a total of 30 lists were extracted: 1) 10 lists for initial or final word parts extracted from lemmas, 2) 10 lists for initial or final word parts extracted from lower-case word forms, 3) 10 lists for initial or final word parts extracted from normalized word forms. In addition, 30 lists were extracted from all words (regardless of their part-of-speech category).
|
|
Keyword:
final part of the word; initial part of the word; morphology; Slovenian language; spoken corpus; word parts
|
|
URL: http://hdl.handle.net/11356/1270
|
|
BASE
|
|
Hide details
|
|
45 |
Corpus extraction tool LIST 1.2
|
|
Krsnik, Luka; Arhar Holdt, Špela; Čibej, Jaka. - : Centre for Language Resources and Technologies, University of Ljubljana, 2019. : Faculty of Computer and Information Science, University of Ljubljana, 2019. : Jožef Stefan Institute, 2019
|
|
BASE
|
|
Show details
|
|
49 |
Frequency lists of character-level n-grams from the Gigafida 2.0 corpus
|
|
|
|
BASE
|
|
Show details
|
|
51 |
Developmental corpus (without language corrections) Šolar 2.0 Clear
|
|
|
|
BASE
|
|
Show details
|
|
52 |
Frequency lists of word-level n-grams from the Gigafida 2.0 corpus
|
|
|
|
BASE
|
|
Show details
|
|
53 |
Frequency lists of word-level n-grams from the GOS 1.0 corpus
|
|
|
|
BASE
|
|
Show details
|
|
54 |
Frequency lists of character-level n-grams from the GOS 1.0 corpus
|
|
|
|
BASE
|
|
Show details
|
|
55 |
Corpus extraction tool LIST 1.0
|
|
Krsnik, Luka; Arhar Holdt, Špela; Čibej, Jaka. - : Centre for Language Resources and Technologies, University of Ljubljana, 2019. : Faculty of Computer and Information Science, University of Ljubljana, 2019. : Jožef Stefan Institute, 2019
|
|
BASE
|
|
Show details
|
|
60 |
The ELEXIS Interface for Interoperable Lexical Resources ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|