1 |
Finding Concept-specific Biases in Form--Meaning Associations ...
|
|
|
|
Abstract:
This work presents an information-theoretic operationalisation of cross-linguistic non-arbitrariness. It is not a new idea that there are small, cross-linguistic associations between the forms and meanings of words. For instance, it has been claimed (Blasi et al., 2016) that the word for "tongue" is more likely than chance to contain the phone [l]. By controlling for the influence of language family and geographic proximity within a very large concept-aligned, cross-lingual lexicon, we extend methods previously used to detect within language non-arbitrariness (Pimentel et al., 2019) to measure cross-linguistic associations. We find that there is a significant effect of non-arbitrariness, but it is unsurprisingly small (less than 0.5% on average according to our information-theoretic estimate). We also provide a concept-level analysis which shows that a quarter of the concepts considered in our work exhibit a significant level of cross-linguistic non-arbitrariness. In sum, the paper provides new methods to ... : Accepted at NAACL 2021. This is the camera ready version. Code is available in https://github.com/rycolab/form-meaning-associations ...
|
|
Keyword:
Computation and Language cs.CL; FOS Computer and information sciences
|
|
URL: https://dx.doi.org/10.48550/arxiv.2104.06325 https://arxiv.org/abs/2104.06325
|
|
BASE
|
|
Hide details
|
|
2 |
Finding Concept-specific Biases in Form–Meaning Associations ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Disambiguatory Signals are Stronger in Word-initial Positions ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Finding Concept-specific Biases in Form–Meaning Associations
|
|
|
|
In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2021)
|
|
BASE
|
|
Show details
|
|
5 |
Disambiguatory Signals are Stronger in Word-initial Positions
|
|
|
|
In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (2021)
|
|
BASE
|
|
Show details
|
|
6 |
Disambiguatory Signals are Stronger in Word-initial Positions ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Finding Concept-specific Biases in Form--Meaning Associations ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Processing South Asian Languages Written in the Latin Script: the Dakshina Dataset ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
Phonotactic Complexity and Its Trade-offs
|
|
|
|
In: Transactions of the Association for Computational Linguistics, 8 (2020)
|
|
BASE
|
|
Show details
|
|
13 |
Explaining vowel inventory tendencies via simulation: finding a role for quantal locations and formant normalization
|
|
|
|
In: North East Linguistics Society (2020)
|
|
BASE
|
|
Show details
|
|
14 |
Are All Languages Equally Hard to Language-Model?
|
|
|
|
In: Proceedings of the Society for Computation in Linguistics (2019)
|
|
BASE
|
|
Show details
|
|
15 |
Rethinking Phonotactic Complexity
|
|
|
|
In: Proceedings of the Society for Computation in Linguistics (2019)
|
|
BASE
|
|
Show details
|
|
17 |
Graph-Based Word Alignment for Clinical Language Evaluation
|
|
|
|
In: Comput Linguist Assoc Comput Linguist (2015)
|
|
BASE
|
|
Show details
|
|
18 |
COMPUTATIONAL ANALYSIS OF TRAJECTORIES OF LINGUISTIC DEVELOPMENT IN AUTISM
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Distributional semantic models for the evaluation of disordered language
|
|
|
|
BASE
|
|
Show details
|
|
20 |
CSSG: Learning within NLP Pipelines for Scalable Data Mining and Information Extraction
|
|
|
|
In: DTIC (2011)
|
|
BASE
|
|
Show details
|
|
|
|