1 |
USCORE: An Effective Approach to Fully Unsupervised Evaluation Metrics for Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
2 |
Constrained Density Matching and Modeling for Cross-lingual Alignment of Contextualized Representations ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Towards Explainable Evaluation Metrics for Natural Language Generation ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
End-to-end style-conditioned poetry generation: What does it take to learn from examples alone? ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Changes in European Solidarity Before and During COVID-19: Evidence from a Large Crowd- and Expert-Annotated Twitter Dataset ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
BERT-Defense: A Probabilistic Model Based on BERT to Combat Cognitively Inspired Orthographic Adversarial Attacks ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Global Explainability of BERT-Based Evaluation Metrics by Disentangling along Linguistic Factors ...
|
|
|
|
Abstract:
Evaluation metrics are a key ingredient for progress of text generation systems. In recent years, several BERT-based evaluation metrics have been proposed (including BERTScore, MoverScore, BLEURT, etc.) which correlate much better with human assessment of text generation quality than BLEU or ROUGE, invented two decades ago. However, little is known what these metrics, which are based on black-box language model representations, actually capture (it is typically assumed they model semantic similarity). In this work, we use a simple regression based global explainability technique to disentangle metric scores along linguistic factors, including semantics, syntax, morphology, and lexical overlap. We show that the different metrics capture all aspects to some degree, but that they are all substantially sensitive to lexical overlap, just like BLEU and ROUGE. This exposes limitations of these novelly proposed metrics, which we also highlight in an adversarial test scenario. ... : EMNLP2021 Camera Ready ...
|
|
Keyword:
Computation and Language cs.CL; FOS Computer and information sciences
|
|
URL: https://arxiv.org/abs/2110.04399 https://dx.doi.org/10.48550/arxiv.2110.04399
|
|
BASE
|
|
Hide details
|
|
9 |
Global Explainability of BERT-Based Evaluation Metrics by Disentangling along Linguistic Factors ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Inducing Language-Agnostic Multilingual Representations ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Probing Multilingual BERT for Genetic and Typological Signals ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
On the Limitations of Cross-lingual Encoders as Exposed by Reference-Free Machine Translation Evaluation ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Vec2Sent: Probing Sentence Embeddings With Natural Language Generation ...
|
|
|
|
BASE
|
|
Show details
|
|
15 |
From Hero to Zéroe: A Benchmark of Low-Level Adversarial Attacks ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
On the limitations of cross-lingual encoders as exposed by reference-free machine translation evaluation
|
|
|
|
BASE
|
|
Show details
|
|
17 |
On aligning OpenIE extractions with Knowledge Bases: A case study
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Semantic Change and Emerging Tropes In a Large Corpus of New High German Poetry ...
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Cross-lingual Argumentation Mining: Machine Translation (and a bit of Projection) is All You Need! ...
|
|
|
|
BASE
|
|
Show details
|
|
20 |
What is the Essence of a Claim? Cross-Domain Claim Identification ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|