DE eng

Search in the Catalogues and Directories

Page: 1 2 3
Hits 1 – 20 of 47

1
Cross-Lingual Word Embedding Refinement by $\ell_{1}$ Norm Optimisation ...
Abstract: Cross-Lingual Word Embeddings (CLWEs) encode words from two or more languages in a shared high-dimensional space in which vectors representing words with similar meaning (regardless of language) are closely located. Existing methods for building high-quality CLWEs learn mappings that minimise the $\ell_{2}$ norm loss function. However, this optimisation objective has been demonstrated to be sensitive to outliers. Based on the more robust Manhattan norm (aka. $\ell_{1}$ norm) goodness-of-fit criterion, this paper proposes a simple post-processing step to improve CLWEs. An advantage of this approach is that it is fully agnostic to the training process of the original CLWEs and can therefore be applied widely. Extensive experiments are performed involving ten diverse languages and embeddings trained on different corpora. Evaluation results based on bilingual lexicon induction and cross-lingual transfer for natural language inference tasks show that the $\ell_{1}$ refinement substantially outperforms four ... : To appear at NAACL 2021 ...
Keyword: Computation and Language cs.CL; FOS Computer and information sciences; Machine Learning cs.LG; Machine Learning stat.ML
URL: https://dx.doi.org/10.48550/arxiv.2104.04916
https://arxiv.org/abs/2104.04916
BASE
Hide details
2
Cross-Lingual Word Embedding Refinement by $\ell_{1}$ Norm Optimisation ...
NAACL 2021 2021; Lin, Chenghua; Peng , Xutan. - : Underline Science Inc., 2021
BASE
Show details
3
SciBabel : a system for crowd-sourced validation of automatic translations of scientific texts
BASE
Show details
4
A word sense disambiguation corpus for Urdu
BASE
Show details
5
A Sense Annotated Corpus for All-Words Urdu Word Sense Disambiguation
BASE
Show details
6
A word sense disambiguation corpus for Urdu [<Journal>]
Saeed, Ali [Verfasser]; Nawab, Rao Muhammad Adeel [Verfasser]; Stevenson, Mark [Verfasser].
DNB Subject Category Language
Show details
7
Accelerating corpus search using multiple cores
Rábara, Radoslav [Verfasser]; Rychlý, Pavel [Verfasser]; Herman, Ondřej [Verfasser]. - Mannheim : Institut für Deutsche Sprache, Bibliothek, 2017
DNB Subject Category Language
Show details
8
Are web corpora inferior? The Case of Czech and Slovak
Benko, Vladimír [Verfasser]; Bański, Piotr [Herausgeber]; Kupietz, Marc [Herausgeber]. - Mannheim : Institut für Deutsche Sprache, Bibliothek, 2017
DNB Subject Category Language
Show details
9
Creating CorCenCC (Corpws Cenedlaethol Cymraeg Cyfoes - The National Corpus of Contemporary Welsh)
Knight, Dawn Verfasser]. - Mannheim : Institut für Deutsche Sprache, Bibliothek, 2017
DNB Subject Category Language
Show details
10
EuReCo - Joining Forces for a European Reference Corpus as a sustainable base for cross-linguistic research
Kupietz, Marc [Verfasser] [Herausgeber]; Witt, Andreas [Verfasser]; Bański, Piotr [Verfasser] [Herausgeber]. - Mannheim : Institut für Deutsche Sprache, Bibliothek, 2017
DNB Subject Category Language
Show details
11
CMC Corpora in DeReKo
Lüngen, Harald [Verfasser] [Herausgeber]; Kupietz, Marc [Verfasser] [Herausgeber]; Bański, Piotr [Herausgeber]. - Mannheim : Institut für Deutsche Sprache, Bibliothek, 2017
DNB Subject Category Language
Show details
12
From ICE to ICC: The new International Comparable Corpus
Kirk, John [Verfasser]; Čermáková, Anna [Verfasser]; Bański, Piotr [Herausgeber]. - Mannheim : Institut für Deutsche Sprache, Bibliothek, 2017
DNB Subject Category Language
Show details
13
Intra-connecting an exemplary literary corpus with semantic web technologies for exploratory literary studies
Dittrich, Andreas [Verfasser]; Bański, Piotr [Herausgeber]; Kupietz, Marc [Herausgeber]. - Mannheim : Institut für Deutsche Sprache, Bibliothek, 2017
DNB Subject Category Language
Show details
14
Keeping Properties with the Data CL-MetaHeaders - An Open Specification
Vidler, John [Verfasser]; Wattam, Stephen [Verfasser]; Bański, Piotr [Herausgeber]. - Mannheim : Institut für Deutsche Sprache, Bibliothek, 2017
DNB Subject Category Language
Show details
15
Removing spam from web corpora through supervised learning using FastText
Suchomel, Vít [Verfasser]; Bański, Piotr [Herausgeber]; Kupietz, Marc [Herausgeber]. - Mannheim : Institut für Deutsche Sprache, Bibliothek, 2017
DNB Subject Category Language
Show details
16
Organizing corpora at the Stanford Literary Lab. Balancing simplicity and flexibility in metadata management
McClure, David [Verfasser]; Algee-Hewitt, Mark [Verfasser]; Douris, Steele [Verfasser]. - Mannheim : Institut für Deutsche Sprache, Bibliothek, 2017
DNB Subject Category Language
Show details
17
Proceedings of the Workshop on Challenges in the Management of Large Corpora and Big Data and Natural Language Processing (CMLC-5+BigNLP) 2017 including the papers from the Web-as-Corpus (WAC-XI) guest section. Birmingham, 24 July 2017
Bański, Piotr [Herausgeber]; Kupietz, Marc [Herausgeber]; Lüngen, Harald [Herausgeber]. - Mannheim : Institut für Deutsche Sprache, Bibliothek, 2017
DNB Subject Category Language
Show details
18
Web corpora - the best possible solution for tracking rare phenomena in underresourced languages: clitics in Bosnian, Croatian and Serbian
Lüngen, Harald [Herausgeber]; Hansen, Björn [Verfasser]; Breiteneder, Evelyn [Herausgeber]. - Mannheim : Institut für Deutsche Sprache, Bibliothek, 2017
DNB Subject Category Language
Show details
19
Improving distant supervision using inference learning ...
BASE
Show details
20
Exploring relation types for literature-based discovery
Preiss, Judita; Stevenson, Mark; Gaizauskas, Robert. - : Oxford University Press, 2015
BASE
Show details

Page: 1 2 3

Catalogues
4
1
3
0
16
0
1
Bibliographies
10
0
0
0
0
0
0
0
3
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
15
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern