1 |
Training dataset and dictionary sizes matter in BERT models: the case of Baltic languages ...
|
|
|
|
BASE
|
|
Show details
|
|
2 |
Training dataset and dictionary sizes matter in BERT models: the case of Baltic languages ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
List of single-word male and female occupations in Slovenian
|
|
|
|
BASE
|
|
Show details
|
|
6 |
SloBERTa: Slovene monolingual large pretrained masked language model ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
SloBERTa: Slovene monolingual large pretrained masked language model ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Evaluation of contextual embeddings on less-resourced languages ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Training dataset and dictionary sizes matter in BERT models: the case of Baltic languages ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Slovenian RoBERTa contextual embeddings model: SloBERTa 1.0
|
|
|
|
BASE
|
|
Show details
|
|
12 |
A Resource for Evaluating Graded Word Similarity in Context: CoSimLex
|
|
|
|
BASE
|
|
Show details
|
|
14 |
FinEst BERT and CroSloEngual BERT: less is more in multilingual models ...
|
|
|
|
BASE
|
|
Show details
|
|
17 |
SemEval-2020 Task 3: Graded Word Similarity in Context
|
|
Santos Armendariz, Carlos; Purver, Matthew; Pollak, Senja. - : International Committee for Computational Linguistics, 2020. : https://www.aclweb.org/anthology/2020.semeval-1.3, 2020. : Proceedings of the 14th International Workshop on Semantic Evaluation (SemEval 2020), 2020
|
|
BASE
|
|
Show details
|
|
18 |
ELMo embeddings models for seven languages
|
|
Ulčar, Matej. - : Faculty of Computer and Information Science, University of Ljubljana, 2019
|
|
Abstract:
ELMo language model (https://github.com/allenai/bilm-tf) used to produce contextual word embeddings, trained on large monolingual corpora for 7 languages: Slovenian, Croatian, Finnish, Estonian, Latvian, Lithuanian and Swedish. Each language's model was trained for approximately 10 epochs. Corpora sizes used in training range from over 270 M tokens in Latvian to almost 2 B tokens in Croatian. About 1 million most common tokens were provided as vocabulary during the training for each language model. The model can also infer OOV words, since the neural network input is on the character level. Each model is in its own .tar.gz archive, consisting of two files: pytorch weights (.hdf5) and options (.json). Both are needed for model inference, using allennlp (https://github.com/allenai/allennlp/blob/master/tutorials/how_to/elmo.md) python library.
|
|
Keyword:
contextual embeddings; Croatian language; ELMo; Estonian language; Finnish language; Latvian language; Lithuanian language; Slovenian language; Swedish language; word embeddings
|
|
URL: http://hdl.handle.net/11356/1277
|
|
BASE
|
|
Hide details
|
|
20 |
ELMo embeddings model, Slovenian
|
|
Ulčar, Matej. - : Faculty of Computer and Information Science, University of Ljubljana, 2019
|
|
BASE
|
|
Show details
|
|
|
|