1 |
ClidSum: A Benchmark Dataset for Cross-Lingual Dialogue Summarization ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
SportsSum2.0: Generating High-Quality Sports News from Live Text Commentary ...
|
|
|
|
Abstract:
Sports game summarization aims to generate news articles from live text commentaries. A recent state-of-the-art work, SportsSum, not only constructs a large benchmark dataset, but also proposes a two-step framework. Despite its great contributions, the work has three main drawbacks: 1) the noise existed in SportsSum dataset degrades the summarization performance; 2) the neglect of lexical overlap between news and commentaries results in low-quality pseudo-labeling algorithm; 3) the usage of directly concatenating rewritten sentences to form news limits its practicability. In this paper, we publish a new benchmark dataset SportsSum2.0, together with a modified summarization framework. In particular, to obtain a clean dataset, we employ crowd workers to manually clean the original dataset. Moreover, the degree of lexical overlap is incorporated into the generation of pseudo labels. Further, we introduce a reranker-enhanced summarizer to take into account the fluency and expressiveness of the summarized news. ... : Accepted as a short paper in CIKM 2021 ...
|
|
Keyword:
Computation and Language cs.CL; FOS Computer and information sciences
|
|
URL: https://arxiv.org/abs/2110.05750 https://dx.doi.org/10.48550/arxiv.2110.05750
|
|
BASE
|
|
Hide details
|
|
|
|