1 |
Learning Meta Word Embeddings by Unsupervised Weighted Concatenation of Source Embeddings ...
|
|
|
|
BASE
|
|
Show details
|
|
2 |
I Wish I Would Have Loved This One, But I Didn't -- A Multilingual Dataset for Counterfactual Detection in Product Reviews ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Language-Independent Tokenisation Rivals Language-Specific Tokenisation for Word Similarity Prediction ...
|
|
|
|
Abstract:
Language-independent tokenisation (LIT) methods that do not require labelled language resources or lexicons have recently gained popularity because of their applicability in resource-poor languages. Moreover, they compactly represent a language using a fixed size vocabulary and can efficiently handle unseen or rare words. On the other hand, language-specific tokenisation (LST) methods have a long and established history, and are developed using carefully created lexicons and training resources. Unlike subtokens produced by LIT methods, LST methods produce valid morphological subwords. Despite the contrasting trade-offs between LIT vs. LST methods, their performance on downstream NLP tasks remain unclear. In this paper, we empirically compare the two approaches using semantic similarity measurement as an evaluation task across a diverse set of languages. Our experimental results covering eight languages show that LST consistently outperforms LIT when the vocabulary size is large, but LIT can produce ... : To appear in the 12th Language Resources and Evaluation (LREC 2020) Conference ...
|
|
Keyword:
Artificial Intelligence cs.AI; Computation and Language cs.CL; FOS Computer and information sciences; Machine Learning cs.LG
|
|
URL: https://arxiv.org/abs/2002.11004 https://dx.doi.org/10.48550/arxiv.2002.11004
|
|
BASE
|
|
Hide details
|
|
4 |
Gender-preserving Debiasing for Pre-trained Word Embeddings ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Think Globally, Embed Locally --- Locally Linear Meta-embedding of Words ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|