43 |
Does Putting a Linguist in the Loop Improve NLU Data Collection ...
|
|
|
|
BASE
|
|
Show details
|
|
46 |
Say `YES' to Positivity: Detecting Toxic Language in Workplace Communications ...
|
|
|
|
BASE
|
|
Show details
|
|
47 |
Unsupervised Multi-View Post-OCR Error Correction With Language Models ...
|
|
|
|
BASE
|
|
Show details
|
|
48 |
AttentionRank: Unsupervised Keyphrase Extraction using Self and Cross Attentions ...
|
|
|
|
BASE
|
|
Show details
|
|
49 |
ProtoInfoMax: Prototypical Networks with Mutual Information Maximization for Out-of-Domain Detection ...
|
|
|
|
BASE
|
|
Show details
|
|
50 |
Multi-granularity Textual Adversarial Attack with Behavior Cloning ...
|
|
|
|
BASE
|
|
Show details
|
|
51 |
Automatic Fact-Checking with Document-level Annotations using BERT and Multiple Instance Learning ...
|
|
|
|
BASE
|
|
Show details
|
|
52 |
Towards the Early Detection of Child Predators in Chat Rooms: A BERT-based Approach ...
|
|
|
|
BASE
|
|
Show details
|
|
53 |
TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning ...
|
|
|
|
Abstract:
Learning sentence embeddings often requires a large amount of labeled data. However, for most tasks and domains, labeled data is seldom available and creating it is expensive. In this work, we present a new state-of-the-art unsupervised method based on pre-trained Transformers and Sequential Denoising Auto-Encoder (TSDAE) which outperforms previous approaches by up to 6.4 points. It can achieve up to 93.1% of the performance of in-domain supervised approaches. Further, we show that TSDAE is a strong domain adaptation and pre-training method for sentence embeddings, significantly outperforming other approaches like Masked Language Model. A crucial shortcoming of previous studies is the narrow evaluation: Most work mainly evaluates on the single task of Semantic Textual Similarity (STS), which does not require any domain knowledge. It is unclear if these proposed methods generalize to other domains and tasks. We fill this gap and evaluate TSDAE and other recent approaches on four different datasets from ...
|
|
Keyword:
Computational Linguistics; Machine Learning; Natural language generation
|
|
URL: https://underline.io/lecture/39314-tsdae-using-transformer-based-sequential-denoising-auto-encoder-for-unsupervised-sentence-embedding-learning https://dx.doi.org/10.48448/ymtp-md67
|
|
BASE
|
|
Hide details
|
|
54 |
WebSRC: A Dataset for Web-Based Structural Reading Comprehension ...
|
|
|
|
BASE
|
|
Show details
|
|
55 |
Improving Math Word Problems with Pre-trained Knowledge and Hierarchical Reasoning ...
|
|
|
|
BASE
|
|
Show details
|
|
56 |
Semantic Categorization of Social Knowledge for Commonsense Question Answering ...
|
|
|
|
BASE
|
|
Show details
|
|
57 |
Adversarial Examples for Evaluating Math Word Problem Solvers ...
|
|
|
|
BASE
|
|
Show details
|
|
58 |
Pre-train or Annotate? Domain Adaptation with a Constrained Budget ...
|
|
|
|
BASE
|
|
Show details
|
|
60 |
Learning with Different Amounts of Annotation: From Zero to Many Labels ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|