Home
Catalogue search
Refine your search:
Keyword
Creator / Publisher:
Kelleher, John D. (4)
Klubicka, Filip (4)
Maldonado, Alfredo (4)
Mahalunkar, Abhijit (2)
SFI Research Centres Programme (2)
ADAPT Centre for Dig- ital Content Technology (1)
ADAPT Centre for Digital Content Technology (1)
John D. Kelleher (1)
Kacmajor, Magdalena (1)
SFI Research Centres Pro-gramme (1)
Year:
2020 (2)
2019 (2)
Medium
Type
BLLDB-Access
Search in the Catalogues and Directories
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
Sort by
creator [A → Z]
'
creator [Z → A]
'
publishing year ↑ (asc)
'
publishing year ↓ (desc)
'
title [A → Z]
'
title [Z → A]
'
Simple Search
Hits 1 – 4 of 4
1
Semantic Relatedness and Taxonomic Word Embeddings ...
Kacmajor, Magdalena
;
Kelleher, John D.
;
Klubicka, Filip
. - : arXiv, 2020
BASE
Show details
2
English WordNet Taxonomic Random Walk Pseudo-Corpora
Klubicka, Filip
;
Maldonado, Alfredo
;
Mahalunkar, Abhijit
...
In: Conference papers (2020)
BASE
Show details
3
Synthetic, Yet Natural: Properties of WordNet Random Walk Corpora and the impact of rare words on embedding performance
Klubicka, Filip
;
Mahalunkar, Abhijit
;
Maldonado, Alfredo
...
In: Conference papers (2019)
BASE
Show details
4
Size Matters: The Impact of Training Size in Taxonomically-Enriched Word Embeddings
Maldonado, Alfredo
;
Klubicka, Filip
;
Kelleher, John D.
In: Articles (2019)
Abstract:
Word embeddings trained on natural corpora (e.g., newspaper collections, Wikipedia or the Web) excel in capturing thematic similarity (“topical relatedness”) on word pairs such as ‘coffee’ and ‘cup’ or ’bus’ and ‘road’. However, they are less successful on pairs showing taxonomic similarity, like ‘cup’ and ‘mug’ (near synonyms) or ‘bus’ and ‘train’ (types of public transport). Moreover, purely taxonomy-based embeddings (e.g. those trained on a random-walk of WordNet’s structure) outperform natural-corpus embeddings in taxonomic similarity but underperform them in thematic similarity. Previous work suggests that performance gains in both types of similarity can be achieved by enriching natural-corpus embeddings with taxonomic information from taxonomies like WordNet. This taxonomic enrichment can be done by combining natural-corpus embeddings with taxonomic embeddings (e.g. those trained on a random-walk of WordNet’s structure). This paper conducts a deep analysis of this assumption and shows that both the size of the natural corpus and of the random-walk coverage of the WordNet structure play a crucial role in the performance of combined (enriched) vectors in both similarity tasks. Specifically, we show that embeddings trained on medium-sized natural corpora benefit the most from taxonomic enrichment whilst embeddings trained on large natural corpora only benefit from this enrichment when evaluated on taxonomic similarity tasks. The implication of this is that care has to be taken in controlling the size of the natural corpus and the size of the random-walk used to train vectors. In addition, we find that, whilst the WordNet structure is finite and it is possible to fully traverse it in a single pass, the repetition of well-connected WordNet concepts in extended random-walks effectively reinforces taxonomic relations in the learned embeddings.
Keyword:
Computational Engineering
;
Computational Linguistics
;
retrofitting
;
semantic similarity
;
taxonomic embeddings
;
taxonomic enrichment
;
word embeddings
;
WordNet
URL:
https://arrow.tudublin.ie/cgi/viewcontent.cgi?article=1090&context=scschcomart
https://arrow.tudublin.ie/scschcomart/83
BASE
Hide details
Mobile view
All
Catalogues
UB Frankfurt Linguistik
0
IDS Mannheim
0
OLC Linguistik
0
UB Frankfurt Retrokatalog
0
DNB Subject Category Language
0
Institut für Empirische Sprachwissenschaft
0
Leibniz-Centre General Linguistics (ZAS)
0
Bibliographies
BLLDB
0
BDSL
0
IDS Bibliografie zur deutschen Grammatik
0
IDS Bibliografie zur Gesprächsforschung
0
IDS Konnektoren im Deutschen
0
IDS Präpositionen im Deutschen
0
IDS OBELEX meta
0
MPI-SHH Linguistics Collection
0
MPI for Psycholinguistics
0
Linked Open Data catalogues
Annohub
0
Online resources
Link directory
0
Journal directory
0
Database directory
0
Dictionary directory
0
Open access documents
BASE
4
Linguistik-Repository
0
IDS Publikationsserver
0
Online dissertations
0
Language Description Heritage
0
© 2013 - 2024 Lin|gu|is|tik
|
Imprint
|
Privacy Policy
|
Datenschutzeinstellungen ändern