Home
Catalogue search
Refine your search:
Keyword
Creator / Publisher:
Cao, Yanshuai (1)
Cheung, Jackie Chi Kit (1)
Huang , Chenyang (1)
Kumar, Dhruv (1)
Prince, Simon (1)
Tang, Keyi (1)
The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing 2021 (1)
Xu, Peng (1)
Yang, Wei (1)
Zi, Wenjie (1)
Year
Medium
Type
BLLDB-Access
Search in the Catalogues and Directories
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
Sort by
creator [A → Z]
'
creator [Z → A]
'
publishing year ↑ (asc)
'
publishing year ↓ (desc)
'
title [A → Z]
'
title [Z → A]
'
Simple Search
Hits 1 – 1 of 1
1
Optimizing Deeper Transformers on Small Datasets ...
The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing 2021
;
Cao, Yanshuai
;
Cheung, Jackie Chi Kit
;
Huang , Chenyang
;
Kumar, Dhruv
;
Prince, Simon
;
Tang, Keyi
;
Xu, Peng
;
Yang, Wei
;
Zi, Wenjie
. - : Underline Science Inc., 2021
Abstract:
Read paper: https://www.aclanthology.org/2021.acl-long.163 Abstract: It is a common belief that training deep transformers from scratch requires large datasets. Consequently, for small datasets, people usually use shallow and simple additional layers on top of pre-trained models during fine-tuning. This work shows that this does not always need to be the case: with proper initialization and optimization, the benefits of very deep transformers can carry over to challenging tasks with small datasets, including Text-to-SQL semantic parsing and logical reading comprehension. In particular, we successfully train 48 layers of transformers, comprising 24 fine-tuned layers from pre-trained RoBERTa and 24 relation-aware layers trained from scratch. With fewer training steps and no task-specific pre-training, we obtain the state of the art performance on the challenging cross-domain Text-to-SQL parsing benchmark Spider. We achieve this by deriving a novel Data dependent Transformer Fixed-update initialization scheme ...
Keyword:
Computational Linguistics
;
Condensed Matter Physics
;
Deep Learning
;
Electromagnetism
;
FOS Physical sciences
;
Information and Knowledge Engineering
;
Neural Network
;
Semantics
URL:
https://dx.doi.org/10.48448/ehsy-3055
https://underline.io/lecture/25482-optimizing-deeper-transformers-on-small-datasets
BASE
Hide details
Mobile view
All
Catalogues
UB Frankfurt Linguistik
IDS Mannheim
OLC Linguistik
UB Frankfurt Retrokatalog
DNB Subject Category Language
Institut für Empirische Sprachwissenschaft
Leibniz-Centre General Linguistics (ZAS)
Bibliographies
BLLDB
BDSL
IDS Bibliografie zur deutschen Grammatik
IDS Bibliografie zur Gesprächsforschung
IDS Konnektoren im Deutschen
IDS Präpositionen im Deutschen
IDS OBELEX meta
MPI-SHH Linguistics Collection
MPI for Psycholinguistics
Linked Open Data catalogues
Annohub
Online resources
Link directory
Journal directory
Database directory
Dictionary directory
Open access documents
BASE
1
Linguistik-Repository
IDS Publikationsserver
Online dissertations
Language Description Heritage
© 2013 - 2024 Lin|gu|is|tik
|
Imprint
|
Privacy Policy
|
Datenschutzeinstellungen ändern