43 |
Does Putting a Linguist in the Loop Improve NLU Data Collection ...
|
|
|
|
Abstract:
Many crowdsourced NLP datasets contain systematic artifacts that are identified only after data collection is complete. Earlier identification of these issues should make it easier to create high-quality training and evaluation data. We attempt this by evaluating protocols in which expert linguists work 'in the loop' during data collection to identify and address these issues by adjusting task instructions and incentives. Using natural language inference as a test case, we compare three data collection protocols: (i) a baseline protocol with no linguist involvement, (ii) a linguist-in-the-loop intervention with iteratively-updated constraints on the annotation task, and (iii) an extension that adds direct interaction between linguists and crowdworkers via a chatroom. We find that linguist involvement does not lead to increased accuracy on out-of-domain test sets compared to baseline, and adding a chatroom has no effect on the data. Linguist involvement does, however, lead to more challenging evaluation data ...
|
|
URL: https://underline.io/lecture/38540-does-putting-a-linguist-in-the-loop-improve-nlu-data-collection https://dx.doi.org/10.48448/xq86-q893
|
|
BASE
|
|
Hide details
|
|
46 |
Say `YES' to Positivity: Detecting Toxic Language in Workplace Communications ...
|
|
|
|
BASE
|
|
Show details
|
|
47 |
Unsupervised Multi-View Post-OCR Error Correction With Language Models ...
|
|
|
|
BASE
|
|
Show details
|
|
48 |
AttentionRank: Unsupervised Keyphrase Extraction using Self and Cross Attentions ...
|
|
|
|
BASE
|
|
Show details
|
|
49 |
ProtoInfoMax: Prototypical Networks with Mutual Information Maximization for Out-of-Domain Detection ...
|
|
|
|
BASE
|
|
Show details
|
|
50 |
Multi-granularity Textual Adversarial Attack with Behavior Cloning ...
|
|
|
|
BASE
|
|
Show details
|
|
51 |
Automatic Fact-Checking with Document-level Annotations using BERT and Multiple Instance Learning ...
|
|
|
|
BASE
|
|
Show details
|
|
52 |
Towards the Early Detection of Child Predators in Chat Rooms: A BERT-based Approach ...
|
|
|
|
BASE
|
|
Show details
|
|
53 |
TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning ...
|
|
|
|
BASE
|
|
Show details
|
|
54 |
WebSRC: A Dataset for Web-Based Structural Reading Comprehension ...
|
|
|
|
BASE
|
|
Show details
|
|
55 |
Improving Math Word Problems with Pre-trained Knowledge and Hierarchical Reasoning ...
|
|
|
|
BASE
|
|
Show details
|
|
56 |
Semantic Categorization of Social Knowledge for Commonsense Question Answering ...
|
|
|
|
BASE
|
|
Show details
|
|
57 |
Adversarial Examples for Evaluating Math Word Problem Solvers ...
|
|
|
|
BASE
|
|
Show details
|
|
58 |
Pre-train or Annotate? Domain Adaptation with a Constrained Budget ...
|
|
|
|
BASE
|
|
Show details
|
|
60 |
Learning with Different Amounts of Annotation: From Zero to Many Labels ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|