DE eng

Search in the Catalogues and Directories

Page: 1 2
Hits 1 – 20 of 38

1
OTTers: One-turn Topic Transitions for Open-Domain Dialogue
Sevegnani, Karin; Howcroft, David M.; Konstas, Ioannis. - : Association for Computational Linguistics (ACL), 2021
BASE
Show details
2
What happens if you treat ordinal ratings as interval data? Human evaluations in {NLP} are even more under-powered than you think
Rieser, Verena; Howcroft, David M.. - : Association for Computational Linguistics (ACL), 2021
Abstract: Previous work has shown that human evaluations in NLP are notoriously under-powered. Here, we argue that there are two common factors which make this problem even worse: NLP studies usually (a) treat ordinal data as interval data and (b) operate under high variance settings while the differences they are hoping to detect are often subtle. We demonstrate through simulation that ordinal mixed effects models are better able to detect small differences between models, especially in high variance settings common in evaluations of generated texts. We release tools for researchers to conduct their own power analysis and test their assumptions. We also make recommendations for improving statistical power.
URL: http://researchrepository.napier.ac.uk/Output/2826175
https://napier-surface.worktribe.com/2826175/1/What%20Happens%20If%20You%20Treat%20Ordinal%20Ratings%20As%20Interval%20Data%3F%20Human%20Evaluations%20In%20%7BNLP%7D%20Are%20Even%20More%20Under-powered%20Than%20You%20Think
BASE
Hide details
3
Grounded Neural Generation ...
BASE
Show details
4
OTTers: One-turn Topic Transitions for Open-Domain Dialogue ...
BASE
Show details
5
ConvAbuse: Data, Analysis, and Benchmarks for Nuanced Detection in Conversational AI ...
BASE
Show details
6
What happens if you treat ordinal ratings as interval data? Human evaluations in NLP are even more under-powered than you think ...
BASE
Show details
7
MiRANews: Dataset and Benchmarks for Multi-Resource-Assisted News Summarization ...
BASE
Show details
8
AggGen: Ordering and Aggregating while Generating ...
BASE
Show details
9
Twenty Years of Confusion in Human Evaluation: NLG Needs Evaluation Sheets and Standardised Definition
van Miltenburg, Emiel; Howcroft, David; Rieser, Verena. - : Association for Computational Linguistics (ACL), 2020
BASE
Show details
10
SLURP: A Spoken Language Understanding Resource Package ...
BASE
Show details
11
SLURP: A Spoken Language Understanding Resource Package ...
BASE
Show details
12
SLURP: A Spoken Language Understanding Resource Package ...
BASE
Show details
13
Conversational Assistants and Gender Stereotypes: Public Perceptions and Desiderata for Voice Personas ...
BASE
Show details
14
Evaluating the State-of-the-Art of End-to-End Natural Language Generation: The E2E NLG Challenge ...
BASE
Show details
15
RankME: Reliable Human Ratings for Natural Language Generation ...
BASE
Show details
16
Better Conversations by Modeling,Filtering,and Optimizing for Coherence and Diversity ...
BASE
Show details
17
Findings of the E2E NLG Challenge ...
BASE
Show details
18
The E2E Dataset: New Challenges For End-to-End Generation ...
BASE
Show details
19
The E2E Challenge Dataset ...
Novikova, Jekaterina; Dusek, Ondrej; Rieser, Verena. - : Heriot-Watt University, 2017
BASE
Show details
20
The REAL corpus
BASE
Show details

Page: 1 2

Catalogues
3
0
2
0
1
0
2
Bibliographies
6
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
28
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern