DE eng

Search in the Catalogues and Directories

Hits 1 – 6 of 6

1
Universal Dependencies 1.2 Models for UDPipe
Straka, Milan. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2016
BASE
Show details
2
DeriNet 1.2
Vidra, Jonáš; Žabokrtský, Zdeněk; Ševčíková, Magda. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2016
BASE
Show details
3
Czech Models (MorfFlex CZ 160310 + PDT 3.0) for MorphoDiTa 160310
Straka, Milan; Straková, Jana. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2016
BASE
Show details
4
WordSim353-cs: Evaluation Dataset for Lexical Similarity and Relatedness, based on WordSim353
Cinková, Silvie; Straková, Jana; Hajič, Jakub; Hajič, Jan; Hajič, Jan, jr.; Janoušková, Jolana; Straka, Milan; Urešová, Miroslava. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2016
Abstract: Czech translation of WordSim353. The Czech translation of English WordSim353 word pairs were obtained from four translators. All translation variants were scored according to the lexical similarity/relatedness annotation instructions for WordSim353 annotators, by 25 Czech annotators. The resulting data set consists of two annotation files: "WordSim353-cs.csv" and "WordSim-cs-Multi.csv". Both files are encoded in UTF-8, have a header, text is enclosed in double quotes, and columns are separated by commas. The rows are numbered. The WordSim-cs-Multi data set has rows numbered from 1 to 634, whereas the row indices in the WordSim353-cs data set reflect the corresponding row numbers in the WordSim-cs-Multi data set. The WordSim353-cs file contains a one-to-one mapping selection of 353 Czech equivalent pairs whose judgments have proven to be most similar to the judgments of their corresponding English originals (compared by the absolute value of the difference between the means over all annotators in each language counterpart). In one case ("psychology-cognition"), two Czech equivalent pairs had identical means as well as confidence intervals, so we randomly selected one. The "WordSim-cs-Multi.csv" file contains human judgments for all translation variants. In both data sets, we preserved all 25 individual scores. In the WordSim353-cs data set, we added a column with their Czech means as well as a column containing the original English means and 95% confidence intervals in separate columns for each mean (computed by the CI function in the Rmisc R package). The WordSim-cs-Multi data set contains only the Czech means and confidence intervals. For the most convenient lexical search, we provided separate columns with the respective Czech and English single words, entire word pairs, and eventually an English-Czech quadruple in both data sets. The data set also contains an xls table with the four translations and a preliminary selection of the best variants performed by an adjudicator.
Keyword: Czech language; distributional semantics; English language; evaluation; lexical semantics; relatedness; similarity
URL: http://hdl.handle.net/11234/1-1713
BASE
Hide details
5
UDPipe
Straka, Milan; Straková, Jana. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2016
BASE
Show details
6
Czech Models (MorfFlex CZ 161115 + PDT 3.0) for MorphoDiTa 161115
Straka, Milan; Straková, Jana. - : Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL), 2016
BASE
Show details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
6
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern