Home Catalogue search

eng

Refine your search:
- Keyword
- Creator / Publisher
- Year
- Medium
- Type
- BLLDB-Access:
  - free (120)
  - subject to license (0)

Search in the Catalogues and Directories






	Sort by
Simple Search

Page: 1 2 3 4 5 6

Hits 1 – 20 of 120

1	Frequency, Informativity and Word Length: Insights from Typologically Diverse Corpora
	Natalia Levshina
	In: Entropy; Volume 24; Issue 2; Pages: 280 (2022)
	Abstract: Zipf’s law of abbreviation, which posits a negative correlation between word frequency and length, is one of the most famous and robust cross-linguistic generalizations. At the same time, it has been shown that contextual informativity (average surprisal given previous context) is more strongly correlated with word length, although this tendency is not observed consistently, depending on several methodological choices. The present study examines a more diverse sample of languages than the previous studies (Arabic, Finnish, Hungarian, Indonesian, Russian, Spanish and Turkish). I use large web-based corpora from the Leipzig Corpora Collection to estimate word lengths in UTF-8 characters and in phonemes (for some of the languages), as well as word frequency, informativity given previous word and informativity given next word, applying different methods of bigrams processing. The results show different correlations between word length and the corpus-based measure for different languages. I argue that these differences can be explained by the properties of noun phrases in a language, most importantly, by the order of heads and modifiers and their relative morphological complexity, as well as by orthographic conventions.
	Keyword: corpora; frequency; informativity; linguistic typology; n-grams; Zipf’s law of abbreviation
	URL: https://doi.org/10.3390/e24020280
	BASE
	Hide details

2	Romance morphology in diachrony through Google n-grams ...
	Radimský, Jan. - : Open Science Framework, 2022
	BASE
	Show details

3	Multi-word units (and tokenization more generally): a multi-dimensional and largely information-theoretic approach
	Stefan Th. Gries
	In: Lexis: Journal in English Lexicology, Vol 19 (2022) (2022)
	BASE
	Show details

4	Meta-Learner for Amharic Sentiment Classification
	Girma Neshir; Andreas Rauber; Solomon Atnafu
	In: Applied Sciences ; Volume 11 ; Issue 18 (2021)
	BASE
	Show details

5	You are kidding right? The English present progressive as a stance marker in film dialogue ...
	Ghia, Elisa. - : University of Salento, 2021
	BASE
	Show details

6	An interactive visualization of Google Books Ngrams with R and Shiny : exploring a(n) historical increase in onset strength in a(n) huge database
	Vetter, Fabian; Schlüter, Julia. - 2021
	BASE
	Show details

7	An interactive visualization of Google Books Ngrams with R and Shiny : exploring a(n) historical increase in onset strength in a(n) huge database
	Vetter, Fabian; Schlüter, Julia. - : Otto-Friedrich-Universität, 2021. : Bamberg, 2021
	BASE
	Show details

8	DIGITAL TECHNOLOGIES FOR GRAMMATICAL ERROR CORRECTION: DEEP LEARNING METHODS & SYNTACTIC N-GRAMS
	POZHARYTSKA , Olena; TROITSKYI , Kyrylo
	In: Мова; No. 35 (2021) ; Мова; № 35 (2021) ; 2414-9489 ; 2307-4558 (2021)
	BASE
	Show details

9	You are kidding right? The English present progressive as a stance marker in film dialogue
	Ghia, Elisa
	In: Lingue e Linguaggi; Volume 44(2021); 183-202 (2021)
	BASE
	Show details

10	Visualizing the development of prose styles in Horse Manuals from Early Modern English to Present-Day English
	Lubbers, Thijs; Los, Bettelou
	In: EISSN: 2416-5999 ; Journal of Data Mining and Digital Humanities ; https://hal.archives-ouvertes.fr/hal-02283138 ; Journal of Data Mining and Digital Humanities, Episciences.org, 2020, Special Issue Visualisations in Historical Linguistics, Special issue on Visualisations in Historical Linguistics, pp.1-33 (2020)
	BASE
	Show details

11	Frequency lists of character-level n-grams from the GOS 1.0 corpus 1.1
	Čibej, Jaka; Arhar Holdt, Špela; Dobrovoljc, Kaja. - : Centre for Language Resources and Technologies, University of Ljubljana, 2020. : Jožef Stefan Institute, 2020
	BASE
	Show details

12	List of formulaic sequences in spoken Slovenian
	Dobrovoljc, Kaja; Roblek, Rebeka; Vianello, Chiara. - : Jožef Stefan Institute, 2020. : Centre for Language Resources and Technologies, University of Ljubljana, 2020
	BASE
	Show details

13	Frequency lists of word-level n-grams from the GOS 1.0 corpus 1.1
	Čibej, Jaka; Arhar Holdt, Špela; Dobrovoljc, Kaja. - : Centre for Language Resources and Technologies, University of Ljubljana, 2020. : Jožef Stefan Institute, 2020
	BASE
	Show details

14	List of formulaic sequences in standard written Slovenian
	Dobrovoljc, Kaja; Roblek, Rebeka; Vianello, Chiara. - : Jožef Stefan Institute, 2020. : Centre for Language Resources and Technologies, University of Ljubljana, 2020
	BASE
	Show details

15	Visualizing the development of prose styles in Horse Manuals from Early Modern English to Present-Day English
	Thijs Lubbers; Bettelou Los
	In: Journal of Data Mining and Digital Humanities, Vol Special issue on Visualisations in Historical Linguistics (2020) (2020)
	BASE
	Show details

16	An interactive visualization of Google Books Ngrams with R and Shiny: Exploring a(n) historical increase in onset strength in a(n) huge database
	Julia Schlüter; Fabian Vetter
	In: Journal of Data Mining and Digital Humanities, Vol Special issue on Visualisations in Historical Linguistics (2020) (2020)
	BASE
	Show details

17	The necessity modals have to, must, need to and should: using n-grams to help identify common and distinct semantic and pragmatic aspects. 11.2: 220-243
	Cappelle, Bert; Depraetere, Ilse; Lesuisse, Mégane
	In: ISSN: 1876-1933 ; EISSN: 1876-1941 ; Constructions and Frames ; https://hal.archives-ouvertes.fr/hal-02369306 ; Constructions and Frames, John Benjamins, 2019, 11, pp.220 - 243. ⟨10.1075/cf.00029.cap⟩ (2019)
	BASE
	Show details

18	The necessity modals have to, must, need to and should: using n-grams to help identify common and distinct semantic and pragmatic aspects
	Cappelle, Bert; Depraetere, Ilse; Lesuisse, Mégane
	In: ISSN: 1876-1933 ; EISSN: 1876-1941 ; Constructions and Frames ; https://hal.archives-ouvertes.fr/hal-02501498 ; Constructions and Frames, John Benjamins, 2019, 11 (2), pp.220-243. ⟨10.1075/cf.00029.cap⟩ (2019)
	BASE
	Show details

19	Keywords and n-grams from a textbook corpus
	Kosem, Iztok; Pori, Eva; Arhar Holdt, Špela. - : Centre for Language Resources and Technologies, University of Ljubljana, 2019
	BASE
	Show details

20	Dependency tree extraction tool STARK 1.0
	Krsnik, Luka; Dobrovoljc, Kaja; Robnik-Šikonja, Marko. - : Centre for Language Resources and Technologies, University of Ljubljana, 2019. : Faculty of Arts, University of Ljubljana, 2019. : Faculty of Computer and Information Science, University of Ljubljana, 2019
	BASE
	Show details

Page: 1 2 3 4 5 6

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern