DE eng

Search in the Catalogues and Directories

Page: 1 2 3 4 5 6
Hits 1 – 20 of 120

1
Frequency, Informativity and Word Length: Insights from Typologically Diverse Corpora
In: Entropy; Volume 24; Issue 2; Pages: 280 (2022)
Abstract: Zipf’s law of abbreviation, which posits a negative correlation between word frequency and length, is one of the most famous and robust cross-linguistic generalizations. At the same time, it has been shown that contextual informativity (average surprisal given previous context) is more strongly correlated with word length, although this tendency is not observed consistently, depending on several methodological choices. The present study examines a more diverse sample of languages than the previous studies (Arabic, Finnish, Hungarian, Indonesian, Russian, Spanish and Turkish). I use large web-based corpora from the Leipzig Corpora Collection to estimate word lengths in UTF-8 characters and in phonemes (for some of the languages), as well as word frequency, informativity given previous word and informativity given next word, applying different methods of bigrams processing. The results show different correlations between word length and the corpus-based measure for different languages. I argue that these differences can be explained by the properties of noun phrases in a language, most importantly, by the order of heads and modifiers and their relative morphological complexity, as well as by orthographic conventions.
Keyword: corpora; frequency; informativity; linguistic typology; n-grams; Zipf’s law of abbreviation
URL: https://doi.org/10.3390/e24020280
BASE
Hide details
2
Romance morphology in diachrony through Google n-grams ...
Radimský, Jan. - : Open Science Framework, 2022
BASE
Show details
3
Multi-word units (and tokenization more generally): a multi-dimensional and largely information-theoretic approach
In: Lexis: Journal in English Lexicology, Vol 19 (2022) (2022)
BASE
Show details
4
Meta-Learner for Amharic Sentiment Classification
In: Applied Sciences ; Volume 11 ; Issue 18 (2021)
BASE
Show details
5
You are kidding right? The English present progressive as a stance marker in film dialogue ...
Ghia, Elisa. - : University of Salento, 2021
BASE
Show details
6
An interactive visualization of Google Books Ngrams with R and Shiny : exploring a(n) historical increase in onset strength in a(n) huge database
BASE
Show details
7
An interactive visualization of Google Books Ngrams with R and Shiny : exploring a(n) historical increase in onset strength in a(n) huge database
Vetter, Fabian; Schlüter, Julia. - : Otto-Friedrich-Universität, 2021. : Bamberg, 2021
BASE
Show details
8
DIGITAL TECHNOLOGIES FOR GRAMMATICAL ERROR CORRECTION: DEEP LEARNING METHODS & SYNTACTIC N-GRAMS
In: Мова; No. 35 (2021) ; Мова; № 35 (2021) ; 2414-9489 ; 2307-4558 (2021)
BASE
Show details
9
You are kidding right? The English present progressive as a stance marker in film dialogue
In: Lingue e Linguaggi; Volume 44(2021); 183-202 (2021)
BASE
Show details
10
Visualizing the development of prose styles in Horse Manuals from Early Modern English to Present-Day English
In: EISSN: 2416-5999 ; Journal of Data Mining and Digital Humanities ; https://hal.archives-ouvertes.fr/hal-02283138 ; Journal of Data Mining and Digital Humanities, Episciences.org, 2020, Special Issue Visualisations in Historical Linguistics, Special issue on Visualisations in Historical Linguistics, pp.1-33 (2020)
BASE
Show details
11
Frequency lists of character-level n-grams from the GOS 1.0 corpus 1.1
Čibej, Jaka; Arhar Holdt, Špela; Dobrovoljc, Kaja. - : Centre for Language Resources and Technologies, University of Ljubljana, 2020. : Jožef Stefan Institute, 2020
BASE
Show details
12
List of formulaic sequences in spoken Slovenian
Dobrovoljc, Kaja; Roblek, Rebeka; Vianello, Chiara. - : Jožef Stefan Institute, 2020. : Centre for Language Resources and Technologies, University of Ljubljana, 2020
BASE
Show details
13
Frequency lists of word-level n-grams from the GOS 1.0 corpus 1.1
Čibej, Jaka; Arhar Holdt, Špela; Dobrovoljc, Kaja. - : Centre for Language Resources and Technologies, University of Ljubljana, 2020. : Jožef Stefan Institute, 2020
BASE
Show details
14
List of formulaic sequences in standard written Slovenian
Dobrovoljc, Kaja; Roblek, Rebeka; Vianello, Chiara. - : Jožef Stefan Institute, 2020. : Centre for Language Resources and Technologies, University of Ljubljana, 2020
BASE
Show details
15
Visualizing the development of prose styles in Horse Manuals from Early Modern English to Present-Day English
In: Journal of Data Mining and Digital Humanities, Vol Special issue on Visualisations in Historical Linguistics (2020) (2020)
BASE
Show details
16
An interactive visualization of Google Books Ngrams with R and Shiny: Exploring a(n) historical increase in onset strength in a(n) huge database
In: Journal of Data Mining and Digital Humanities, Vol Special issue on Visualisations in Historical Linguistics (2020) (2020)
BASE
Show details
17
The necessity modals have to, must, need to and should: using n-grams to help identify common and distinct semantic and pragmatic aspects. 11.2: 220-243
In: ISSN: 1876-1933 ; EISSN: 1876-1941 ; Constructions and Frames ; https://hal.archives-ouvertes.fr/hal-02369306 ; Constructions and Frames, John Benjamins, 2019, 11, pp.220 - 243. ⟨10.1075/cf.00029.cap⟩ (2019)
BASE
Show details
18
The necessity modals have to, must, need to and should: using n-grams to help identify common and distinct semantic and pragmatic aspects
In: ISSN: 1876-1933 ; EISSN: 1876-1941 ; Constructions and Frames ; https://hal.archives-ouvertes.fr/hal-02501498 ; Constructions and Frames, John Benjamins, 2019, 11 (2), pp.220-243. ⟨10.1075/cf.00029.cap⟩ (2019)
BASE
Show details
19
Keywords and n-grams from a textbook corpus
Kosem, Iztok; Pori, Eva; Arhar Holdt, Špela. - : Centre for Language Resources and Technologies, University of Ljubljana, 2019
BASE
Show details
20
Dependency tree extraction tool STARK 1.0
Krsnik, Luka; Dobrovoljc, Kaja; Robnik-Šikonja, Marko. - : Centre for Language Resources and Technologies, University of Ljubljana, 2019. : Faculty of Arts, University of Ljubljana, 2019. : Faculty of Computer and Information Science, University of Ljubljana, 2019
BASE
Show details

Page: 1 2 3 4 5 6

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
120
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern