1 |
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
|
|
|
|
In: Proceedings of the 1st Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2021) ; https://hal.archives-ouvertes.fr/hal-03466171 ; Proceedings of the 1st Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2021), Aug 2021, Online, France. pp.96-120, ⟨10.18653/v1/2021.gem-1.10⟩ (2021)
|
|
BASE
|
|
Show details
|
|
2 |
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Towards Syntax-Aware DialogueSummarization using Multi-task Learning ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Who speaks like a style of Vitamin: Towards Syntax-Aware DialogueSummarization using Multi-task Learning ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Natural language processing methods are sensitive to sub-clinical linguistic differences in schizophrenia spectrum disorders
|
|
|
|
In: NPJ Schizophr (2021)
|
|
BASE
|
|
Show details
|
|
6 |
Measuring the `I don't know' Problem through the Lens of Gricean Quantity ...
|
|
|
|
Abstract:
We consider the intrinsic evaluation of neural generative dialog models through the lens of Grice's Maxims of Conversation (1975). Based on the maxim of Quantity (be informative), we propose Relative Utterance Quantity (RUQ) to diagnose the `I don't know' problem, in which a dialog system produces generic responses. The linguistically motivated RUQ diagnostic compares the model score of a generic response to that of the reference response. We find that for reasonable baseline models, `I don't know' is preferred over the reference the majority of the time, but this can be reduced to less than 5% with hyperparameter tuning. RUQ allows for the direct analysis of the `I don't know' problem, which has been addressed but not analyzed by prior work. ... : to appear at NAACL 2021 ...
|
|
Keyword:
Computation and Language cs.CL; FOS Computer and information sciences
|
|
URL: https://arxiv.org/abs/2010.12786 https://dx.doi.org/10.48550/arxiv.2010.12786
|
|
BASE
|
|
Hide details
|
|
7 |
SMRT Chatbots: Improving Non-Task-Oriented Dialog with Simulated Multiple Reference Training ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Complexity-Weighted Loss and Diverse Reranking for Sentence Simplification ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Comparison of Diverse Decoding Methods from Conditional Language Models ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Learning Word Ratings for Empathy and Distress from Document-Level User Responses ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Unsupervised Post-processing of Word Vectors via Conceptor Negation ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|