DE eng

Search in the Catalogues and Directories

Page: 1 2
Hits 1 – 20 of 28

1
Finetuning Pretrained Transformers into RNNs ...
Abstract: Anthology paper link: https://aclanthology.org/2021.emnlp-main.830/ Abstract: Transformers have outperformed recurrent neural networks (RNNs) in natural language generation. This comes with a significant computational overhead, as the attention mechanism scales with a quadratic complexity in sequence length. Efficient transformer variants have received increasing interest from recent works. Among them, a linear-complexity recurrent variant has proven well suited for autoregressive generation. It approximates the softmax attention with randomized or heuristic feature maps, but can be difficult to train or yield suboptimal accuracy. This work aims to convert a pretrained transformer into its efficient recurrent counterpart, improving the efficiency while retaining the accuracy. Specifically, we propose a swap-then-finetune procedure: in an off-the-shelf pretrained transformer, we replace the softmax attention with its linear-complexity recurrent alternative and then finetune. With a learned feature map, our ...
Keyword: Computational Linguistics; Machine Learning; Machine Learning and Data Mining; Natural Language Processing; Neural Network
URL: https://dx.doi.org/10.48448/w4sb-sz82
https://underline.io/lecture/37314-finetuning-pretrained-transformers-into-rnns
BASE
Hide details
2
Sentence Bottleneck Autoencoders from Transformer Language Models ...
BASE
Show details
3
Grounded Compositional Outputs for Adaptive Language Modeling ...
BASE
Show details
4
Regularization Advantages of Multilingual Neural Language Models for Low Resource Domains ...
BASE
Show details
5
STACKED NEURAL NETWORKS WITH PARAMETER SHARING FOR MULTILINGUAL LANGUAGE MODELING
In: http://infoscience.epfl.ch/record/272000 (2019)
BASE
Show details
6
GILE: A Generalized Input-Label Embedding for Text Classification
In: Transactions of the Association for Computational Linguistics, Vol 7, Pp 139-155 (2019) (2019)
BASE
Show details
7
Integrating Weakly Supervised Word Sense Disambiguation into Neural Machine Translation ...
BASE
Show details
8
Integrating Weakly Supervised Word Sense Disambiguation into Neural Machine Translation ...
BASE
Show details
9
Integrating Weakly Supervised Word Sense Disambiguation into Neural Machine Translation ...
BASE
Show details
10
Multilingual Hierarchical Attention Networks for Document Classification ...
BASE
Show details
11
The Summa Platform Prototype ...
BASE
Show details
12
Multilingual Hierarchical Attention Networks for Document Classification ...
BASE
Show details
13
Multilingual Hierarchical Attention Networks for Document Classification ...
BASE
Show details
14
The Summa Platform Prototype ...
BASE
Show details
15
Self-Attentive Residual Decoder for Neural Machine Translation ...
BASE
Show details
16
Sense-Aware Statistical Machine Translation using Adaptive Context-Dependent Clustering ...
BASE
Show details
17
Sense-Aware Statistical Machine Translation using Adaptive Context-Dependent Clustering ...
BASE
Show details
18
Multilingual Hierarchical Attention Networks for Document Classification
In: http://infoscience.epfl.ch/record/231134 (2017)
BASE
Show details
19
Cross-lingual Transfer for News Article Labeling: Benchmarking Statistical and Neural Models
In: http://infoscience.epfl.ch/record/231130 (2017)
BASE
Show details
20
Evaluating Attention Networks for Anaphora Resolution
In: http://infoscience.epfl.ch/record/231846 (2017)
BASE
Show details

Page: 1 2

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
28
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern