DE eng

Search in the Catalogues and Directories

Page: 1...57 58 59 60 61 62 63
Hits 1.201 – 1.220 of 1.255

1201
Paths to Relation Extraction through Semantic Structure ...
BASE
Show details
1202
Rule Augmented Unsupervised Constituency Parsing ...
BASE
Show details
1203
Transition-based Bubble Parsing: Improvements on Coordination Structure Prediction ...
BASE
Show details
1204
Dodrio: Exploring Transformer Models with Interactive Visualization ...
BASE
Show details
1205
Vyākarana A Colorless Green Benchmark for Syntactic Evaluation in Indic Languages ...
BASE
Show details
1206
What if This Modified That? Syntactic Interventions with Counterfactual Embeddings ...
BASE
Show details
1207
Annotations Matter: Leveraging Multi-task Learning to Parse UD and SUD ...
BASE
Show details
1208
The Limitations of Limited Context for Constituency Parsing ...
BASE
Show details
1209
Effective Batching for Recurrent Neural Network Grammars ...
BASE
Show details
1210
Factorising Meaning and Form for Intent-Preserving Paraphrasing ...
BASE
Show details
1211
Infusing Finetuning with Semantic Dependencies ...
BASE
Show details
1212
7D: Syntax: Tagging, Chunking, and Parsing #1 ...
BASE
Show details
1213
OntoGUM: Evaluating Contextualized SOTA Coreference Resolution on 12 More Genres ...
BASE
Show details
1214
Topicalization in Language Models: A Case Study on Japanese ...
BASE
Show details
1215
An In-depth Study on Internal Structure of Chinese Words ...
BASE
Show details
1216
To Point or Not to Point: Understanding How Abstractive Summarizers Paraphrase Text ...
BASE
Show details
1217
ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information ...
BASE
Show details
1218
When Do You Need Billions of Words of Pretraining Data? ...
Abstract: Read paper: https://www.aclanthology.org/2021.acl-long.90 Abstract: NLP is currently dominated by language models like RoBERTa which are pretrained on billions of words. But what exact knowledge or skills do Transformer LMs learn from large-scale pretraining that they cannot learn from less data? To explore this question, we adopt five styles of evaluation: classifier probing, information-theoretic probing, unsupervised relative acceptability judgments, unsupervised language model knowledge probing, and fine-tuning on NLU tasks. We then draw learning curves that track the growth of these different measures of model ability with respect to pretraining data volume using the MiniBERTas, a group of RoBERTa models pretrained on 1M, 10M, 100M and 1B words. We find that these LMs require only about 10M to 100M words to learn to reliably encode most syntactic and semantic features we test. They need a much larger quantity of data in order to acquire enough commonsense knowledge and other skills required to master ...
Keyword: Computational Linguistics; Condensed Matter Physics; Deep Learning; Electromagnetism; FOS Physical sciences; Information and Knowledge Engineering; Neural Network; Semantics
URL: https://underline.io/lecture/25974-when-do-you-need-billions-of-words-of-pretraining-dataquestion
https://dx.doi.org/10.48448/ngkd-rr69
BASE
Hide details
1219
Bridge-Based Active Domain Adaptation for Aspect Term Extraction ...
BASE
Show details
1220
Recursive Tree-Structured Self-Attention for Answer Sentence Selection ...
BASE
Show details

Page: 1...57 58 59 60 61 62 63

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
1.255
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern