4 |
The Orange workflow for observing collocation trends ColTrend 1.0
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Slovene ontology of semantic types for nouns SLONEST-noun 1.0
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Multiword Expressions lexicon extracted from the Gigafida 2.1 corpus
|
|
|
|
BASE
|
|
Show details
|
|
8 |
The Orange workflow for observing collocation clusters ColEmbed 1.0
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Frequency lists of collocations from the Gigafida 2.1 corpus
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Frequency lists of character-level n-grams from the GOS 1.0 corpus 1.1
|
|
|
|
BASE
|
|
Show details
|
|
15 |
Frequency lists of words from the GOS 1.0 corpus 1.1
|
|
|
|
Abstract:
Frequency lists of words were extracted from the GOS 1.0 Corpus of Spoken Slovene (http://hdl.handle.net/11356/1040) using the LIST corpus extraction tool (http://hdl.handle.net/11356/1227). The lists contain all words occurring in the corpus along with their absolute and relative frequencies, percentages, and distribution across the text-types included in the corpus taxonomy. The lists were extracted for each part-of-speech category. For each part-of-speech, two lists were extracted: 1) one containing lemmas and their text-type distribution, 2) one containing lower-case word forms as well as their standardized forms, lemmas, and morphosyntactic tags along with their text-type distribution. In addition, four lists were extracted from all words (regardless of their part-of-speech category): 1) a list of all lemmas along with their part-of-speech category and text-type distribution; 2) a list of all lower-case word forms with their lemmas, part-of-speech categories, and text-type distribution; 3) a list of all lower-case word forms with their standardized word forms, lemmas, part-of-speech categories, and text-type distribution; 4) a list of all morphosyntactic tags and their text-type distribution (the tags are also split into several columns). Compared to the previous version (http://hdl.handle.net/11356/1269), this one includes fixes of several typos and substitutes all instances of "normalized forms" with the more adequate term "standardized forms" (as used in the SSJ project).
|
|
Keyword:
frequency list; lemmas; Slovenian language; spoken corpus; standardized forms; words
|
|
URL: http://hdl.handle.net/11356/1364
|
|
BASE
|
|
Hide details
|
|
20 |
Frequency lists of word-level n-grams from the GOS 1.0 corpus 1.1
|
|
|
|
BASE
|
|
Show details
|
|
|
|