1 |
On Learning Language-Invariant Representations for Universal Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
2 |
Extending and Improving Wordnet via Unsupervised Word Embeddings ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Linear Algebraic Structure of Word Senses, with Applications to Polysemy ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
A Latent Variable Model Approach to PMI-based Word Embeddings ...
|
|
|
|
Abstract:
Semantic word embeddings represent the meaning of a word via a vector, and are created by diverse methods. Many use nonlinear operations on co-occurrence statistics, and have hand-tuned hyperparameters and reweighting methods. This paper proposes a new generative model, a dynamic version of the log-linear topic model of~\citet{mnih2007three}. The methodological novelty is to use the prior to compute closed form expressions for word statistics. This provides a theoretical justification for nonlinear models like PMI, word2vec, and GloVe, as well as some hyperparameter choices. It also helps explain why low-dimensional semantic embeddings contain linear algebraic structure that allows solution of word analogies, as shown by~\citet{mikolov2013efficient} and many subsequent papers. Experimental support is provided for the generative model assumptions, the most important of which is that latent word vectors are fairly uniformly dispersed in space. ... : Appear in Transactions of the Association for Computational Linguistics (TACL), 2016 ...
|
|
Keyword:
Computation and Language cs.CL; FOS Computer and information sciences; Machine Learning cs.LG; Machine Learning stat.ML
|
|
URL: https://dx.doi.org/10.48550/arxiv.1502.03520 https://arxiv.org/abs/1502.03520
|
|
BASE
|
|
Hide details
|
|
|
|