1 |
Updating the dictionary: Semantic change identification based on change in bigrams over time
|
|
|
|
In: Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave, Vol 8, Iss 2 (2020) (2020)
|
|
Abstract:
We investigate a method of updating a Danish monolingual dictionary with new semantic information on already included lemmas in a systematic way, based on the hypothesis that the variation in bigrams over time in a corpus might indicate changes in the meaning of one of the words. The method combines corpus statistics with manual annotations. The first step consists in measuring the collocational change in a homogeneous newswire corpus with texts from a 14 year time span, 2005 through 2018, by calculating all the statistically significant bigrams. These are then applied to a new version of the corpus that is split into one sub-corpus per year. We then collect all the bigrams that do not appear at all in the first three years, but appear at least 20 times in the following 11 years. The output, a dataset of 745 bigrams considered to be potentially new in Danish, are double annotated, and depending on the annotations and the inter-annotator agreement, either discarded or divided into groups of relevant data for further investigation. We then carry out a more thorough lexicographical study of the bigrams in order to determine the degree to which they support the identification of new senses and lead to revised sense inventories for at least one of the words Furthermore we study the relation between the revisions carried out, the annotation values and the degree of inter-annotator agreement. Finally, we compare the resulting updates of the dictionary with Cook et al. (2013), and discuss whether the method might lead to a more consistent way of revising and updating the dictionary in the future.
|
|
Keyword:
bigrams; corpus statistics; Danish; dictionary update; P1-1091; Philology. Linguistics; semantic change
|
|
URL: https://doi.org/10.4312/slo2.0.2020.2.112-138 https://doaj.org/article/6e9d7673ee0a4101bf814df5665368b0
|
|
BASE
|
|
Hide details
|
|
2 |
Numerical orthographic coding: merging Open Bigrams and Spatial Coding theories
|
|
|
|
In: https://hal.archives-ouvertes.fr/hal-01687304 ; 2019 (2019)
|
|
BASE
|
|
Show details
|
|
4 |
Тематические модели: добавление биграмм и учет сходства между униграммами и биграммами ... : Topic models: adding bigrams and taking account of the similarity between unigrams and bigrams ...
|
|
Нокель, М.А.; Лукашевич, Н.В.. - : Научно-исследовательский вычислительный центр Московского государственного университета им. М.В. Ломоносова, 2015
|
|
BASE
|
|
Show details
|
|
7 |
of London, UK Reviewed by:
|
|
|
|
In: ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/10/6f/Front_Psychol_2011_Jun_21_2_136.tar.gz (2011)
|
|
BASE
|
|
Show details
|
|
8 |
When ‘more’ in statistical learning means ‘less’ in language: individual differences in predictive processing of adjacent dependencies
|
|
|
|
In: http://cnl.psych.cornell.edu/pubs/2010-mc-cogsci.pdf (2010)
|
|
BASE
|
|
Show details
|
|
9 |
A Controlled Skip Parser
|
|
|
|
In: ftp://ftp.isi.edu/pub/kyamada/skip.ps (1996)
|
|
BASE
|
|
Show details
|
|
11 |
Word recognition in reading - Doctoral thesis ; L'identification des mots au cours de la lecture - Doctorat de Troisième Cycle en Psychologie
|
|
|
|
In: https://hal.archives-ouvertes.fr/tel-01273401 ; Réseau de neurones [cs.NE]. Université de Provence (Aix-Marseille 1), 1983. Français (1983)
|
|
BASE
|
|
Show details
|
|
12 |
Identifying Urdu Complex Predication via Bigram Extraction
|
|
|
|
In: http://kops.uni-konstanz.de/bitstream/handle/123456789/29101/Butt_0-253654.pdf%3Bjsessionid%3DFB808089C5FA051ABD663CB35F558DA5?sequence%3D2
|
|
BASE
|
|
Show details
|
|
13 |
Identifying Urdu Complex Predication via Bigram Extraction M iriam But t 1 T ina Bögel 1
|
|
|
|
In: http://aclweb.org/anthology/C/C12/C12-1026.pdf
|
|
BASE
|
|
Show details
|
|
14 |
Sentiment Analysis of Movie Reviews using POS tags and Term Frequencies
|
|
|
|
In: http://research.ijcaonline.org/volume96/number25/pxc3897048.pdf
|
|
BASE
|
|
Show details
|
|
|
|