Page: 1 2 3 4 5 6 7 8... 47
61 |
Incorporating medical knowledge in BERT for clinical relation extraction ...
|
|
|
|
BASE
|
|
Show details
|
|
62 |
Crosslingual Transfer Learning for Relation and Event Extraction via Word Category and Class Alignments ...
|
|
|
|
BASE
|
|
Show details
|
|
63 |
ECONET: Effective Continual Pretraining of Language Models for Event Temporal Reasoning ...
|
|
|
|
BASE
|
|
Show details
|
|
64 |
A Partition Filter Network for Joint Entity and Relation Extraction ...
|
|
|
|
BASE
|
|
Show details
|
|
65 |
#WhyDidTheyStay: An NLP-driven approach to analyzing the factors that affect domestic violence victims ...
|
|
|
|
BASE
|
|
Show details
|
|
66 |
Learning Prototype Representations Across Few-Shot Tasks for Event Detection ...
|
|
|
|
BASE
|
|
Show details
|
|
67 |
Uncovering Main Causalities for Long-tailed Information Extraction ...
|
|
|
|
BASE
|
|
Show details
|
|
68 |
Transformer Feed-Forward Layers Are Key-Value Memories ...
|
|
|
|
Abstract:
Anthology paper link: https://aclanthology.org/2021.emnlp-main.446/ Abstract: Feed-forward layers constitute two-thirds of a transformer model's parameters, yet their role in the network remains under-explored. We show that feed-forward layers in transformer-based language models operate as key-value memories, where each key correlates with textual patterns in the training examples, and each value induces a distribution over the output vocabulary. Our experiments show that the learned patterns are human-interpretable, and that lower layers tend to capture shallow patterns, while upper layers learn more semantic ones. The values complement the keys' input patterns by inducing output distributions that concentrate probability mass on tokens likely to appear immediately after each pattern, particularly in the upper layers. Finally, we demonstrate that the output of a feed-forward layer is a composition of its memories, which is subsequently refined throughout the model's layers via residual connections to ...
|
|
Keyword:
Computational Linguistics; Information Extraction; Machine Learning; Machine Learning and Data Mining; Natural Language Processing
|
|
URL: https://underline.io/lecture/37500-transformer-feed-forward-layers-are-key-value-memories https://dx.doi.org/10.48448/69nd-5363
|
|
BASE
|
|
Hide details
|
|
70 |
Machine Reading Comprehension as Data Augmentation: A Case Study on Implicit Event Argument Extraction ...
|
|
|
|
BASE
|
|
Show details
|
|
71 |
A Relation-Oriented Clustering Method for Open Relation Extraction ...
|
|
|
|
BASE
|
|
Show details
|
|
72 |
Time-dependent Entity Embedding is not All You Need: A Re-evaluation of Temporal Knowledge Graph Completion Models under a Unified Framework ...
|
|
|
|
BASE
|
|
Show details
|
|
73 |
MasakhaNER: Named Entity Recognition for African Languages ...
|
|
|
|
BASE
|
|
Show details
|
|
74 |
The Future is not One-dimensional: Complex Event Schema Induction by Graph Modeling for Event Prediction ...
|
|
|
|
BASE
|
|
Show details
|
|
76 |
Unsupervised Relation Extraction: A Variational Autoencoder Approach ...
|
|
|
|
BASE
|
|
Show details
|
|
77 |
Instance-adaptive training with noise-robust losses against noisy labels ...
|
|
|
|
BASE
|
|
Show details
|
|
78 |
Gradient Imitation Reinforcement Learning for Low Resource Relation Extraction ...
|
|
|
|
BASE
|
|
Show details
|
|
79 |
Extracting Material Property Measurement Data from Scientific Articles ...
|
|
|
|
BASE
|
|
Show details
|
|
Page: 1 2 3 4 5 6 7 8... 47
|
|