DE eng

Search in the Catalogues and Directories

Hits 1 – 18 of 18

1
Universal Dependencies 2.9
Zeman, Daniel; Nivre, Joakim; Abrams, Mitchell. - : Universal Dependencies Consortium, 2021
BASE
Show details
2
Universal Dependencies 2.8.1
Zeman, Daniel; Nivre, Joakim; Abrams, Mitchell. - : Universal Dependencies Consortium, 2021
BASE
Show details
3
Universal Dependencies 2.8
Zeman, Daniel; Nivre, Joakim; Abrams, Mitchell. - : Universal Dependencies Consortium, 2021
BASE
Show details
4
BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models ...
BASE
Show details
5
Including Signed Languages in Natural Language Processing ...
BASE
Show details
6
Including Signed Languages in Natural Language Processing ...
BASE
Show details
7
Contrastive Explanations for Model Interpretability ...
BASE
Show details
8
Provable Limitations of Acquiring Meaning from Ungrounded Form: What will Future Language Models Understand? ...
BASE
Show details
9
Measuring and Improving Consistency in Pretrained Language Models ...
BASE
Show details
10
Aligning Faithful Interpretations with their Social Attribution ...
BASE
Show details
11
Amnesic Probing: Behavioral Explanation With Amnesic Counterfactuals ...
BASE
Show details
12
Data Augmentation for Sign Language Gloss Translation ...
BASE
Show details
13
Effects of Parameter Norm Growth During Transformer Training: Inductive Bias from Gradient Descent ...
Abstract: Anthology paper link: https://aclanthology.org/2021.emnlp-main.133/ Abstract: The capacity of neural networks like the widely adopted transformer is known to be very high. Evidence is emerging that they learn successfully due to inductive bias in the training routine, typically a variant of gradient descent (GD). To better understand this bias, we study the tendency for transformer parameters to grow in magnitude ($\ell_2$ norm) during training, and its implications for the emergent representations within self attention layers. Empirically, we document norm growth in the training of transformer language models, including T5 during its pretraining. As the parameters grow in magnitude, we prove that the network approximates a discretized network with saturated activation functions. Such "saturated" networks are known to have a reduced capacity compared to the full network family that can be described in terms of formal languages and automata. Our results suggest saturation is a new characterization of an ...
Keyword: Language Models; Natural Language Processing; Semantic Evaluation; Sociolinguistics
URL: https://underline.io/lecture/37533-effects-of-parameter-norm-growth-during-transformer-training-inductive-bias-from-gradient-descent
https://dx.doi.org/10.48448/2yr8-q466
BASE
Hide details
14
Asking It All: Generating Contextualized Questions for any Semantic Role ...
BASE
Show details
15
Counterfactual Interventions Reveal the Causal Effect of Relative Clause Representations on Agreement Prediction ...
BASE
Show details
16
Neural Extractive Search ...
BASE
Show details
17
Counterfactual Interventions Reveal the Causal Effect of Relative Clause Representations on Agreement Prediction ...
BASE
Show details
18
Ab Antiquo: Neural Proto-language Reconstruction ...
NAACL 2021 2021; Goldberg, Yoav; Meloni, Carlo. - : Underline Science Inc., 2021
BASE
Show details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
18
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern