DE eng

Search in the Catalogues and Directories

Hits 1 – 3 of 3

1
A Resource for Evaluating Graded Word Similarity in Context: CoSimLex
Armendariz, Carlos; Matthew, Purver; Ulčar, Matej. - : Queen Mary University, 2020
BASE
Show details
2
ELMo embeddings models for seven languages
Ulčar, Matej. - : Faculty of Computer and Information Science, University of Ljubljana, 2019
Abstract: ELMo language model (https://github.com/allenai/bilm-tf) used to produce contextual word embeddings, trained on large monolingual corpora for 7 languages: Slovenian, Croatian, Finnish, Estonian, Latvian, Lithuanian and Swedish. Each language's model was trained for approximately 10 epochs. Corpora sizes used in training range from over 270 M tokens in Latvian to almost 2 B tokens in Croatian. About 1 million most common tokens were provided as vocabulary during the training for each language model. The model can also infer OOV words, since the neural network input is on the character level. Each model is in its own .tar.gz archive, consisting of two files: pytorch weights (.hdf5) and options (.json). Both are needed for model inference, using allennlp (https://github.com/allenai/allennlp/blob/master/tutorials/how_to/elmo.md) python library.
Keyword: contextual embeddings; Croatian language; ELMo; Estonian language; Finnish language; Latvian language; Lithuanian language; Slovenian language; Swedish language; word embeddings
URL: http://hdl.handle.net/11356/1277
BASE
Hide details
3
Multilingual Culture-Independent Word Analogy Datasets
Ulčar, Matej; Vaik, Kristiina; Lindström, Jessica. - : Faculty of Computer and Information Science, University of Ljubljana, 2019
BASE
Show details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
3
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern