Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Hits 1 – 7 of 7

1	The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
	Gehrmann, Sebastian; Adewumi, Tosin; Aggarwal, Karmanya...
	In: Proceedings of the 1st Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2021) ; https://hal.archives-ouvertes.fr/hal-03466171 ; Proceedings of the 1st Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2021), Aug 2021, Online, France. pp.96-120, ⟨10.18653/v1/2021.gem-1.10⟩ (2021)
	BASE
	Show details

2	The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics ...
	Gehrmann, Sebastian; Adewumi, Tosin; Aggarwal, Karmanya. - : arXiv, 2021
	BASE
	Show details

3	Twenty Years of Confusion in Human Evaluation: NLG Needs Evaluation Sheets and Standardised Definition
	van Miltenburg, Emiel; Howcroft, David; Rieser, Verena; Gkatzia, Dimitra; Santhanam, Sashank; Hasan, Sadid A.; Belz, Anya; Clinciu, Miruna; Mille, Simon; Mahamood, Saad. - : Association for Computational Linguistics (ACL), 2020
	Abstract: Human assessment remains the most trusted form of evaluation in NLG, but highly diverse approaches and a proliferation of different quality criteria used by researchers make it difficult to compare results and draw conclusions across papers, with adverse implications for meta-evaluation and reproducibility. In this paper, we present (i) our dataset of 165 NLG papers with human evaluations, (ii) the annotation scheme we developed to label the papers for different aspects of evaluations, (iii) quantitative analyses of the annotations, and (iv) a set of recommendations for improving standards in evaluation reporting. We use the annotations as a basis for examining information included in evaluation reports, and levels of consistency in approaches, experimental design and terminology, focusing in particular on the 200+ different terms that have been used for evaluated aspects of quality. We conclude that due to a pervasive lack of clarity in reports and extreme diversity in approaches, human evaluation in NLG presents as extremely confused in 2020, and that the field is in urgent need of standard methods and terminology.
	URL: https://napier-surface.worktribe.com/2697597/1/Twenty%20Years%20Of%20Confusion%20In%20Human%20Evaluation%3A%20NLG%20Needs%20Evaluation%20Sheets%20And%20Standardised%20Definition%20%28acepted%20version%29 http://researchrepository.napier.ac.uk/Output/2697597
	BASE
	Hide details

4	How do image description systems describe people? A targeted assessment of system competence in the PEOPLE-domain ...
	The 28th International Conference on Computational Linguistics 2020; van Miltenburg, Emiel. - : Underline Science Inc., 2020
	BASE
	Show details

5	Neural data-to-text generation: A comparison between pipeline and end-to-end architectures ...
	Ferreira, Thiago Castro; van der Lee, Chris; van Miltenburg, Emiel. - : arXiv, 2019
	BASE
	Show details

6	Cross-linguistic differences and similarities in image descriptions ...
	van Miltenburg, Emiel; Elliott, Desmond; Vossen, Piek. - : arXiv, 2017
	BASE
	Show details

7	Detecting and ordering adjectival scalemates ...
	van Miltenburg, Emiel. - : arXiv, 2015
	BASE
	Show details

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern