42 |
Bildungssprachliche Mittel. Eine Analyse von Schülertexten aus dem Sachunterricht der Primarstufe
|
|
|
|
In: 2020, 339 S. - (Empirische Forschung im Elementar- und Primarbereich; 6) - (Koblenz-Landau, Universität, Dissertation, 2019) (2020)
|
|
BASE
|
|
Show details
|
|
43 |
Bildungssprachliche Mittel. Eine Analyse von Schülertexten aus dem Sachunterricht der Primarstufe ...
|
|
|
|
BASE
|
|
Show details
|
|
44 |
The origins of Russian-Tajik Sign Language : investigating the historical sources and transmission of a signed language in Tajikistan ...
|
|
|
|
BASE
|
|
Show details
|
|
45 |
collogetr: Collocates retriever and Collocation association measure ...
|
|
|
|
BASE
|
|
Show details
|
|
46 |
collogetr: Collocates retriever and Collocation association measure ...
|
|
|
|
BASE
|
|
Show details
|
|
47 |
Age, Task Characteristics, and Acoustic Indicators of Engagement: Investigations into the Validity of a Technology-Enhanced Speaking Test for Young Language Learners ...
|
|
|
|
BASE
|
|
Show details
|
|
48 |
A Standardized Project Gutenberg Corpus for Statistical Analysis of Natural Language and Quantitative Linguistics
|
|
|
|
In: Entropy ; Volume 22 ; Issue 1 (2020)
|
|
BASE
|
|
Show details
|
|
49 |
From Boltzmann to Zipf through Shannon and Jaynes
|
|
|
|
In: Entropy ; Volume 22 ; Issue 2 (2020)
|
|
Abstract:
The word-frequency distribution provides the fundamental building blocks that generate discourse in natural language. It is well known, from empirical evidence, that the word-frequency distribution of almost any text is described by Zipf&rsquo ; s law, at least approximately. Following Stephens and Bialek (2010), we interpret the frequency of any word as arising from the interaction potentials between its constituent letters. Indeed, Jaynes&rsquo ; maximum-entropy principle, with the constrains given by every empirical two-letter marginal distribution, leads to a Boltzmann distribution for word probabilities, with an energy-like function given by the sum of the all-to-all pairwise (two-letter) potentials. The so-called improved iterative-scaling algorithm allows us finding the potentials from the empirical two-letter marginals. We considerably extend Stephens and Bialek&rsquo ; s results, applying this formalism to words with length of up to six letters from the English subset of the recently created Standardized Project Gutenberg Corpus. We find that the model is able to reproduce Zipf&rsquo ; s law, but with some limitations: the general Zipf&rsquo ; s power-law regime is obtained, but the probability of individual words shows considerable scattering. In this way, a pure statistical-physics framework is used to describe the probabilities of words. As a by-product, we find that both the empirical two-letter marginal distributions and the interaction-potential distributions follow well-defined statistical laws.
|
|
Keyword:
Boltzmann factor; maximum entropy principle; power laws; quantitative linguistics; two-letter interactions; word-frequency distribution; Zipf’s law
|
|
URL: https://doi.org/10.3390/e22020179
|
|
BASE
|
|
Hide details
|
|
50 |
The Brevity Law as a Scaling Law, and a Possible Origin of Zipf’s Law for Word Frequencies
|
|
|
|
In: Entropy ; Volume 22 ; Issue 2 (2020)
|
|
BASE
|
|
Show details
|
|
51 |
Measuring coselectional constraint in learner corpora: A graph-based approach
|
|
|
|
BASE
|
|
Show details
|
|
52 |
Sociolinguistic Research Methodology: a Framework Design ...
|
|
|
|
BASE
|
|
Show details
|
|
53 |
Quranic studies made in Austria : approaching quantitative Arabic linguistics
|
|
|
|
BASE
|
|
Show details
|
|
54 |
Assessing Topical Homogeneity with Word Embedding and Distance Matrices
|
|
|
|
In: School of Information Studies - Faculty Scholarship (2020)
|
|
BASE
|
|
Show details
|
|
55 |
Corpus-based approach meets LFG: Puzzling voice alternation in Indonesian
|
|
|
|
BASE
|
|
Show details
|
|
56 |
The origins of Russian-Tajik Sign Language : investigating the historical sources and transmission of a signed language in Tajikistan
|
|
|
|
BASE
|
|
Show details
|
|
57 |
A usage-based approach to relativization: an investigation of advanced-learners’ written production of relative clauses in Japanese
|
|
|
|
BASE
|
|
Show details
|
|
59 |
THE PEOPLE WHO “BURN”: “COMMUNICATION,” UNITY, AND CHANGE IN BELARUSIAN DISCOURSE ON PUBLIC CREATIVITY
|
|
|
|
In: Doctoral Dissertations (2020)
|
|
BASE
|
|
Show details
|
|
60 |
A Quantitative Research of the Book of Odes (Shījīng 詩經): the Discovery of the Underlying Rhythm in the Incentive Process (xīng 興) ; 《詩經》的量化研究:發掘興體詩的隱藏節奏
|
|
|
|
In: ISSN: 2616-5732 ; Journal of Digital Archives and Digital Humanities ; https://hal.archives-ouvertes.fr/hal-02474312 ; Journal of Digital Archives and Digital Humanities, Taiwanese Association for Digital Humanity & Ainosco, 2019, Journal of Digital Archives and Digital Humanities, 4, pp.49-70 ; http://www.airitilibrary.com/Publication/alPublicationJournal?PublicationID=P20180801001 (2019)
|
|
BASE
|
|
Show details
|
|
|
|