9 |
Finding Concept-specific Biases in Form--Meaning Associations ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Searching for Search Errors in Neural Morphological Inflection ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Applying the Transformer to Character-level Transduction ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
Quantifying Gender Bias Towards Politicians in Cross-Lingual Language Models ...
|
|
|
|
BASE
|
|
Show details
|
|
17 |
Examining the Inductive Bias of Neural Language Models with Artificial Languages ...
|
|
|
|
BASE
|
|
Show details
|
|
20 |
Differentiable Subset Pruning of Transformer Heads ...
|
|
|
|
Abstract:
Multi-head attention, a collection of several attention mechanisms that independently attend to different parts of the input, is the key ingredient in the Transformer. Recent work has shown, however, that a large proportion of the heads in a Transformer's multi-head attention mechanism can be safely pruned away without significantly harming the performance of the model; such pruning leads to models that are noticeably smaller and faster in practice. Our work introduces a new head pruning technique that we term differentiable subset pruning. Intuitively, our method learns per-head importance variables and then enforces a user-specified hard constraint on the number of unpruned heads. The importance variables are learned via stochastic gradient descent. We conduct experiments on natural language inference and machine translation; we show that differentiable subset pruning performs comparably or better than previous works while offering precise control of the sparsity level. ...
|
|
Keyword:
Computational Linguistics; Machine Learning; Machine Learning and Data Mining; Natural Language Processing
|
|
URL: https://underline.io/lecture/38190-differentiable-subset-pruning-of-transformer-heads https://dx.doi.org/10.48448/bk2x-zy23
|
|
BASE
|
|
Hide details
|
|
|
|