Catalogue search • Linguistik portal • Fachinformationsdienst (FID)

1	A Resource for Evaluating Graded Word Similarity in Context: CoSimLex
	Armendariz, Carlos; Matthew, Purver; Ulčar, Matej. - : Queen Mary University, 2020
	BASE
	Show details

2	ELMo embeddings models for seven languages
	Ulčar, Matej. - : Faculty of Computer and Information Science, University of Ljubljana, 2019
	Abstract: ELMo language model (https://github.com/allenai/bilm-tf) used to produce contextual word embeddings, trained on large monolingual corpora for 7 languages: Slovenian, Croatian, Finnish, Estonian, Latvian, Lithuanian and Swedish. Each language's model was trained for approximately 10 epochs. Corpora sizes used in training range from over 270 M tokens in Latvian to almost 2 B tokens in Croatian. About 1 million most common tokens were provided as vocabulary during the training for each language model. The model can also infer OOV words, since the neural network input is on the character level. Each model is in its own .tar.gz archive, consisting of two files: pytorch weights (.hdf5) and options (.json). Both are needed for model inference, using allennlp (https://github.com/allenai/allennlp/blob/master/tutorials/how_to/elmo.md) python library.
	Keyword: contextual embeddings; Croatian language; ELMo; Estonian language; Finnish language; Latvian language; Lithuanian language; Slovenian language; Swedish language; word embeddings
	URL: http://hdl.handle.net/11356/1277
	BASE
	Hide details

3	Multilingual Culture-Independent Word Analogy Datasets
	Ulčar, Matej; Vaik, Kristiina; Lindström, Jessica. - : Faculty of Computer and Information Science, University of Ljubljana, 2019
	BASE
	Show details

Search in the Catalogues and Directories