1 |
QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension ...
|
|
|
|
Abstract:
Alongside huge volumes of research on deep learning models in NLP in the recent years, there has been also much work on benchmark datasets needed to track modeling progress. Question answering and reading comprehension have been particularly prolific in this regard, with over 80 new datasets appearing in the past two years. This study is the largest survey of the field to date. We provide an overview of the various formats and domains of the current resources, highlighting the current lacunae for future work. We further discuss the current classifications of ``reasoning types" in question answering and propose a new taxonomy. We also discuss the implications of over-focusing on English, and survey the current monolingual resources for other languages and multilingual resources. The study is aimed at both practitioners looking for pointers to the wealth of existing data, and at researchers working on new resources. ... : Under review ...
|
|
Keyword:
Artificial Intelligence cs.AI; Computation and Language cs.CL; FOS Computer and information sciences
|
|
URL: https://arxiv.org/abs/2107.12708 https://dx.doi.org/10.48550/arxiv.2107.12708
|
|
BASE
|
|
Hide details
|
|
2 |
Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
COVR: A Test-Bed for Visually Grounded Compositional Generalization with Real Images ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Enforcing Consistency in Weakly Supervised Semantic Parsing ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Understanding Mention Detector-Linker Interaction in Neural Coreference Resolution ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Mitigating False-Negative Contexts in Multi-document Question Answering with Retrieval Marginalization ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Tailor: Generating and Perturbing Text with Semantic Controls ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Competency Problems: On Finding and Removing Artifacts in Language Data ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Mitigating False-Negative Contexts in Multi-document Question Answering with Retrieval Marginalization ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Latent Compositional Representations Improve Systematic Generalization in Grounded Question Answering ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
Evaluating Models' Local Decision Boundaries via Contrast Sets ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
IIRC: A Dataset of Incomplete Information Reading Comprehension Questions ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Quoref: A Reading Comprehension Dataset with Questions Requiring Coreferential Reasoning ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
The Phonology of the Canadian Shift Revisited: Thunder Bay & Cape Breton
|
|
|
|
In: University of Pennsylvania Working Papers in Linguistics (2013)
|
|
BASE
|
|
Show details
|
|
|
|