1 |
Shapley Idioms: Analysing BERT Sentence Embeddings for General Idiom Token Identification
|
|
|
|
In: Front Artif Intell (2022)
|
|
BASE
|
|
Show details
|
|
3 |
English WordNet Taxonomic Random Walk Pseudo-Corpora
|
|
|
|
In: Conference papers (2020)
|
|
BASE
|
|
Show details
|
|
4 |
Language related issues for machine translation between closely related south Slavic languages
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Synthetic, Yet Natural: Properties of WordNet Random Walk Corpora and the impact of rare words on embedding performance
|
|
|
|
In: Conference papers (2019)
|
|
BASE
|
|
Show details
|
|
6 |
Size Matters: The Impact of Training Size in Taxonomically-Enriched Word Embeddings
|
|
|
|
In: Articles (2019)
|
|
BASE
|
|
Show details
|
|
8 |
Quantitative Fine-Grained Human Evaluation of Machine Translation Systems: a Case Study on English to Croatian ...
|
|
|
|
Abstract:
This paper presents a quantitative fine-grained manual evaluation approach to comparing the performance of different machine translation (MT) systems. We build upon the well-established Multidimensional Quality Metrics (MQM) error taxonomy and implement a novel method that assesses whether the differences in performance for MQM error types between different MT systems are statistically significant. We conduct a case study for English-to-Croatian, a language direction that involves translating into a morphologically rich language, for which we compare three MT systems belonging to different paradigms: pure phrase-based, factored phrase-based and neural. First, we design an MQM-compliant error taxonomy tailored to the relevant linguistic phenomena of Slavic languages, which made the annotation process feasible and accurate. Errors in MT outputs were then annotated by two annotators following this taxonomy. Subsequently, we carried out a statistical analysis which showed that the best-performing system (neural) ... : 22 pages, 2 figures, 9 tables, 1 equation. This is a post-peer-review, pre-copyedit version of an article published in Machine Translation Journal. The final authenticated version will be available online at the journal page. arXiv admin note: substantial text overlap with arXiv:1706.04389 ...
|
|
Keyword:
68T50; Artificial Intelligence cs.AI; Computation and Language cs.CL; FOS Computer and information sciences
|
|
URL: https://arxiv.org/abs/1802.01451 https://dx.doi.org/10.48550/arxiv.1802.01451
|
|
BASE
|
|
Hide details
|
|
9 |
Is it worth it? Budget-related evaluation metrics for model selection ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Quantitative Fine-grained Human Evaluation of Machine Translation Systems: a Case Study on English to Croatian
|
|
|
|
In: Articles (2018)
|
|
BASE
|
|
Show details
|
|
11 |
Is it worth it? Budget-related evaluation metrics for model selection
|
|
|
|
In: Conference papers (2018)
|
|
BASE
|
|
Show details
|
|
12 |
hr500k – A Reference Training Corpus of Croatian.
|
|
|
|
In: Conference papers (2018)
|
|
BASE
|
|
Show details
|
|
17 |
Fine-grained human evaluation of neural versus phrase-based machine translation ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Fine-Grained Human Evaluation of Neural Versus Phrase-Based Machine Translation
|
|
|
|
In: Prague Bulletin of Mathematical Linguistics , Vol 108, Iss 1, Pp 121-132 (2017) (2017)
|
|
BASE
|
|
Show details
|
|
|
|