DE eng

Search in the Catalogues and Directories

Page: 1 2 3
Hits 1 – 20 of 46

1
How to train your self-supervised NLP model: Investigating pre-training objectives, data, and scale
Joshi, Mandar. - 2022
Abstract: Thesis (Ph.D.)--University of Washington, 2022 ; A robust language processing machine should be able to encode linguistic and factual knowledge across a wide variety of domains, languages, and even modalities. The paradigm of pre-training self-supervised models on large text corpora has driven much of recent progress towards this goal. In spite of this large scale pre-training, the best performing models have to be further fine-tuned on downstream tasks -- often containing hundreds of thousands of examples -- to achieve state of the art performance. The aim of this thesis is twofold: (a) to design efficient scalable pre-training methods which capture different kinds of linguistic and world knowledge, and (b) to enable better downstream performance with fewer human-labeled examples. The first part of the thesis focuses on self-supervised objectives for reasoning about relationships between pairs of words. In NLI, for example, given the premise "golf is prohibitively expensive", inferring that the hypothesis "golf is a cheap pastime" is a contradiction requires one to know that expensive and cheap are antonyms. We show that with the right kind of self-supervised objectives, such knowledge learned with word pair vectors (pair2vec) directly from text without using curated knowledge bases and ontologies. The second part of the thesis seeks to build models which encode knowledge beyond word pair relations into model parameters. We present SpanBERT, a scalable pre-training method that is designed to better represent and predict spans of text. Span-based pre-training objectives seek to efficiently encode a wider variety of knowledge, and improve the state of the art for a range of NLP tasks. The third part of the thesis focuses integrating dynamically retrieved textual knowledge. Specifically, even large scale representations are not able to preserve all factual knowledge they have "read'" during pre-training due to the long tail of entity and event-specific information. We show that training models to integrate background knowledge during pre-training is especially useful for downstream tasks which require reasoning over this long tail. The last part of the thesis targets a major weakness of self-supervised models -- while such models requires no explicit human supervision during pre-training, they still need lots of human-labeled downstream task data. We seek to remedy this by mining input-output pairs (and thus obtaining direct task-level supervision) from corpora using supervision from very few labeled examples. Overall, this thesis presents a range of ideas required for effective pre-training and fine-tuning -- (a) self-supervised objectives, (b) model scale, and (c) new types of data. As we will show in the following chapters, self-supervised objectives could have a large influence on the form of knowledge that is acquired during pre-training. Moreover, efficient objectives directly enable model scale both in terms of data and parameters. Finally, the training data and the kind of supervision derived from it itself dictates how well a model can learn different kinds of downstream tasks.
Keyword: Computer science; Computer science and engineering; nlp; pretraining; representations; self supervised
URL: http://hdl.handle.net/1773/48474
BASE
Hide details
2
Multilingual Autoregressive Entity Linking ...
BASE
Show details
3
Few-shot Learning with Multilingual Language Models ...
BASE
Show details
4
DESCGEN: A Distantly Supervised Datasetfor Generating Entity Descriptions ...
BASE
Show details
5
Prompting Contrastive Explanations for Commonsense Reasoning Tasks ...
BASE
Show details
6
Detecting Hallucinated Content in Conditional Neural Sequence Generation ...
BASE
Show details
7
FaVIQ: FAct Verification from Information-seeking Questions ...
BASE
Show details
8
Quantifying Adaptability in Pre-trained Language Models with 500 Tasks ...
BASE
Show details
9
Bilingual Lexicon Induction via Unsupervised Bitext Construction and Word Alignment ...
BASE
Show details
10
Do Syntactic Probes Probe Syntax? Experiments with Jabberwocky Probing
In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2021)
BASE
Show details
11
What About the Precedent: An Information-Theoretic Analysis of Common Law
In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2021)
BASE
Show details
12
Finding Concept-specific Biases in Form–Meaning Associations
In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2021)
BASE
Show details
13
A Non-Linear Structural Probe
In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2021)
BASE
Show details
14
How (Non-)Optimal is the Lexicon?
In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2021)
BASE
Show details
15
Bilingual Lexicon Induction via Unsupervised Bitext Construction and Word Alignment ...
BASE
Show details
16
Backtranslation feedback improves user confidence in MT, not quality
Obregón, Mateo; Fomicheva, Marina; Novák, Michal. - : Association for Computational Linguistics, 2021
BASE
Show details
17
Nearest Neighbor Machine Translation ...
BASE
Show details
18
Pre-training via Paraphrasing ...
BASE
Show details
19
Multilingual Denoising Pre-training for Neural Machine Translation ...
Liu, Yinhan; Gu, Jiatao; Goyal, Naman. - : arXiv, 2020
BASE
Show details
20
De-noising Sequence-to-Sequence Pre-training ...
BASE
Show details

Page: 1 2 3

Catalogues
1
0
0
0
0
0
0
Bibliographies
1
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
44
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern