22 |
Efficient-FedRec: Efficient Federated Learning Framework for Privacy-Preserving News Recommendation ...
|
|
|
|
BASE
|
|
Show details
|
|
24 |
Not All Negatives are Equal: Label-Aware Contrastive Loss for Fine-grained Text Classification ...
|
|
|
|
BASE
|
|
Show details
|
|
25 |
Improving Graph-based Sentence Ordering with Iteratively Predicted Pairwise Orderings ...
|
|
|
|
BASE
|
|
Show details
|
|
28 |
Unsupervised Multi-View Post-OCR Error Correction With Language Models ...
|
|
|
|
BASE
|
|
Show details
|
|
29 |
AttentionRank: Unsupervised Keyphrase Extraction using Self and Cross Attentions ...
|
|
|
|
BASE
|
|
Show details
|
|
30 |
ProtoInfoMax: Prototypical Networks with Mutual Information Maximization for Out-of-Domain Detection ...
|
|
|
|
BASE
|
|
Show details
|
|
31 |
Multi-granularity Textual Adversarial Attack with Behavior Cloning ...
|
|
|
|
BASE
|
|
Show details
|
|
32 |
Automatic Fact-Checking with Document-level Annotations using BERT and Multiple Instance Learning ...
|
|
|
|
BASE
|
|
Show details
|
|
33 |
Towards the Early Detection of Child Predators in Chat Rooms: A BERT-based Approach ...
|
|
|
|
BASE
|
|
Show details
|
|
34 |
TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning ...
|
|
|
|
BASE
|
|
Show details
|
|
35 |
WebSRC: A Dataset for Web-Based Structural Reading Comprehension ...
|
|
|
|
BASE
|
|
Show details
|
|
36 |
Improving Math Word Problems with Pre-trained Knowledge and Hierarchical Reasoning ...
|
|
|
|
BASE
|
|
Show details
|
|
37 |
Semantic Categorization of Social Knowledge for Commonsense Question Answering ...
|
|
|
|
BASE
|
|
Show details
|
|
38 |
Pre-train or Annotate? Domain Adaptation with a Constrained Budget ...
|
|
|
|
Abstract:
Anthology paper link: https://aclanthology.org/2021.emnlp-main.409/ Abstract: Recent work has demonstrated that pre-training in-domain language models can boost performance when adapting to a new domain. However, the costs associated with pre-training raise an important question: given a fixed budget, what steps should an NLP practitioner take to maximize performance? In this paper, we study domain adaptation under budget constraints, and approach it as a customer choice problem between data annotation and pre-training. Specifically, we measure the annotation cost of three procedural text datasets and the pre-training cost of three in-domain language models. Then we evaluate the utility of different combinations of pre-training and data annotation under varying budget constraints to assess which combination strategy works best. We find that, for small budgets, spending all funds on annotation leads to the best performance; once the budget becomes large enough, a combination of data annotation and in-domain ...
|
|
Keyword:
Computational Linguistics; Language Models; Machine Learning; Machine Learning and Data Mining; Natural Language Processing
|
|
URL: https://underline.io/lecture/37963-pre-train-or-annotatequestion-domain-adaptation-with-a-constrained-budget https://dx.doi.org/10.48448/z1gf-n855
|
|
BASE
|
|
Hide details
|
|
40 |
Learning with Different Amounts of Annotation: From Zero to Many Labels ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|