1 |
The learnability consequences of Zipfian distributions: Word Segmentation is Facilitated in More Predictable Distributions ...
|
|
|
|
BASE
|
|
Show details
|
|
2 |
Data for: The learnability consequences of Zipfian distributions: Word Segmentation is Facilitated in More Predictable Distributions ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
The learnability consequences of Zipfian distributions: Word Segmentation is Facilitated in More Predictable Distributions ...
|
|
|
|
Abstract:
One of the striking commonalities between languages is the way word frequencies are distributed. Across languages, word frequencies follow a Zipfian distribution, showing a power law relation between a word's frequency and its rank (Zipf, 1949). Intuitively, this means that languages have relatively few high-frequency words and many low-frequency ones. While studied extensively, little work has explored the learnability consequences of the greater predictability of words in such distributions. Here, we propose such distributions confer a learnability advantage for word segmentation, a foundational aspect of language acquisition. We capture the greater predictability of words using the information-theoretic notion of efficiency, which tells us how predictable a distribution is relative to a uniform one. We first use corpus analyses to show that child-directed speech is similarly predictable across fifteen different languages. We then experimentally investigate the impact of distribution predictability on ...
|
|
Keyword:
150; Information theory; Language acquisition; Statistical learning; Word segmentation; Zipf's law
|
|
URL: https://dx.doi.org/10.23668/psycharchives.3075 https://www.psycharchives.org/handle/20.500.12034/2693
|
|
BASE
|
|
Hide details
|
|
|
|