1 |
Training dataset and dictionary sizes matter in BERT models: the case of Baltic languages ...
|
|
|
|
BASE
|
|
Show details
|
|
2 |
Training dataset and dictionary sizes matter in BERT models: the case of Baltic languages ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
List of single-word male and female occupations in Slovenian
|
|
|
|
BASE
|
|
Show details
|
|
6 |
SloBERTa: Slovene monolingual large pretrained masked language model ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
SloBERTa: Slovene monolingual large pretrained masked language model ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Evaluation of contextual embeddings on less-resourced languages ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Training dataset and dictionary sizes matter in BERT models: the case of Baltic languages ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Slovenian RoBERTa contextual embeddings model: SloBERTa 1.0
|
|
|
|
BASE
|
|
Show details
|
|
12 |
A Resource for Evaluating Graded Word Similarity in Context: CoSimLex
|
|
|
|
BASE
|
|
Show details
|
|
14 |
FinEst BERT and CroSloEngual BERT: less is more in multilingual models ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Multilingual Culture-Independent Word Analogy Datasets ...
|
|
|
|
Abstract:
In text processing, deep neural networks mostly use word embeddings as an input. Embeddings have to ensure that relations between words are reflected through distances in a high-dimensional numeric space. To compare the quality of different text embeddings, typically, we use benchmark datasets. We present a collection of such datasets for the word analogy task in nine languages: Croatian, English, Estonian, Finnish, Latvian, Lithuanian, Russian, Slovenian, and Swedish. We designed the monolingual analogy task to be much more culturally independent and also constructed cross-lingual analogy datasets for the involved languages. We present basic statistics of the created datasets and their initial evaluation using fastText embeddings. ...
|
|
Keyword:
analogy task; evaluation; less-resourced languages; word embeddings
|
|
URL: https://dx.doi.org/10.5281/zenodo.3894553 https://zenodo.org/record/3894553
|
|
BASE
|
|
Hide details
|
|
17 |
SemEval-2020 Task 3: Graded Word Similarity in Context
|
|
Santos Armendariz, Carlos; Purver, Matthew; Pollak, Senja. - : International Committee for Computational Linguistics, 2020. : https://www.aclweb.org/anthology/2020.semeval-1.3, 2020. : Proceedings of the 14th International Workshop on Semantic Evaluation (SemEval 2020), 2020
|
|
BASE
|
|
Show details
|
|
18 |
ELMo embeddings models for seven languages
|
|
Ulčar, Matej. - : Faculty of Computer and Information Science, University of Ljubljana, 2019
|
|
BASE
|
|
Show details
|
|
20 |
ELMo embeddings model, Slovenian
|
|
Ulčar, Matej. - : Faculty of Computer and Information Science, University of Ljubljana, 2019
|
|
BASE
|
|
Show details
|
|
|
|