Catalogue search • Linguistik portal • Fachinformationsdienst (FID)

1	Memory-efficient Transformers via Top-k Attention ...
	The 2021 Conference on Empirical Methods in Natural Language Processing 2021; ., Shaya; Berant, Jonathan; Ciprut, David; Dar, Guy; Gupta, Ankit. - : Underline Science Inc., 2021
	Abstract: Following the success of dot-product attention in Transformers, numerous approximations have been recently proposed to address its quadratic complexity with respect to the input length. While these variants are memory and compute efficient, it is not possible to directly use them with popular pre-trained language models trained using vanilla attention, without an expensive corrective pre-training stage. In this work, we propose a simple yet highly accurate approximation for vanilla attention. We process the queries in chunks, and for each query, compute the top-k scores with respect to the keys. Our approach offers several advantages: (a) its memory usage is linear in the input size, similar to linear attention variants, such as Performer and RFA (b) it is a drop-in replacement for vanilla attention that does not require any corrective pre-training, and (c) it can also lead to significant memory savings in the feed-forward layers after casting them into the familiar query-key-value framework. We evaluate the ...
	Keyword: Computational Linguistics; Machine Learning; Machine Learning and Data Mining; Natural Language Processing
	URL: https://underline.io/lecture/39767-memory-efficient-transformers-via-top-k-attention https://dx.doi.org/10.48448/djrt-d829
	BASE
	Hide details

Search in the Catalogues and Directories