2 |
What happens if you treat ordinal ratings as interval data? Human evaluations in {NLP} are even more under-powered than you think
|
|
|
|
BASE
|
|
Show details
|
|
4 |
OTTers: One-turn Topic Transitions for Open-Domain Dialogue ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
ConvAbuse: Data, Analysis, and Benchmarks for Nuanced Detection in Conversational AI ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
What happens if you treat ordinal ratings as interval data? Human evaluations in NLP are even more under-powered than you think ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
MiRANews: Dataset and Benchmarks for Multi-Resource-Assisted News Summarization ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Twenty Years of Confusion in Human Evaluation: NLG Needs Evaluation Sheets and Standardised Definition
|
|
|
|
BASE
|
|
Show details
|
|
10 |
SLURP: A Spoken Language Understanding Resource Package ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
SLURP: A Spoken Language Understanding Resource Package ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
SLURP: A Spoken Language Understanding Resource Package ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Conversational Assistants and Gender Stereotypes: Public Perceptions and Desiderata for Voice Personas ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Evaluating the State-of-the-Art of End-to-End Natural Language Generation: The E2E NLG Challenge ...
|
|
|
|
BASE
|
|
Show details
|
|
15 |
RankME: Reliable Human Ratings for Natural Language Generation ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Better Conversations by Modeling,Filtering,and Optimizing for Coherence and Diversity ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
The E2E Dataset: New Challenges For End-to-End Generation ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|