Page: 1 2 3 4 5 6 7 8... 686
61 |
Statistical and Spatio-temporal Hand Gesture Features for Sign Language Recognition using the Leap Motion Sensor ...
|
|
|
|
BASE
|
|
Show details
|
|
64 |
Giant Pigeon and Small Person: Prompting Visually Grounded Models about the Size of Objects ...
|
|
Zhang, Yi. - : Purdue University Graduate School, 2022
|
|
BASE
|
|
Show details
|
|
65 |
Giant Pigeon and Small Person: Prompting Visually Grounded Models about the Size of Objects ...
|
|
Zhang, Yi. - : Purdue University Graduate School, 2022
|
|
BASE
|
|
Show details
|
|
66 |
pNLP-Mixer: an Efficient all-MLP Architecture for Language ...
|
|
|
|
BASE
|
|
Show details
|
|
67 |
Multilingual Abusiveness Identification on Code-Mixed Social Media Text ...
|
|
|
|
BASE
|
|
Show details
|
|
68 |
hate-alert@DravidianLangTech-ACL2022: Ensembling Multi-Modalities for Tamil TrollMeme Classification ...
|
|
|
|
BASE
|
|
Show details
|
|
69 |
StableMoE: Stable Routing Strategy for Mixture of Experts ...
|
|
|
|
BASE
|
|
Show details
|
|
70 |
BERTuit: Understanding Spanish language in Twitter through a native transformer ...
|
|
|
|
BASE
|
|
Show details
|
|
71 |
EVI: Multilingual Spoken Dialogue Tasks and Dataset for Knowledge-Based Enrolment, Verification, and Identification ...
|
|
|
|
BASE
|
|
Show details
|
|
73 |
Towards the Next 1000 Languages in Multilingual Machine Translation: Exploring the Synergy Between Supervised and Self-Supervised Learning ...
|
|
|
|
BASE
|
|
Show details
|
|
74 |
Data Bootstrapping Approaches to Improve Low Resource Abusive Language Detection for Indic Languages ...
|
|
|
|
BASE
|
|
Show details
|
|
75 |
Out of Thin Air: Is Zero-Shot Cross-Lingual Keyword Detection Better Than Unsupervised? ...
|
|
|
|
BASE
|
|
Show details
|
|
76 |
Assessment of Massively Multilingual Sentiment Classifiers ...
|
|
|
|
BASE
|
|
Show details
|
|
78 |
MASSIVE: A 1M-Example Multilingual Natural Language Understanding Dataset with 51 Typologically-Diverse Languages ...
|
|
|
|
BASE
|
|
Show details
|
|
79 |
DAMO-NLP at SemEval-2022 Task 11: A Knowledge-based System for Multilingual Named Entity Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
80 |
DeepNet: Scaling Transformers to 1,000 Layers ...
|
|
|
|
Abstract:
In this paper, we propose a simple yet effective method to stabilize extremely deep Transformers. Specifically, we introduce a new normalization function (DeepNorm) to modify the residual connection in Transformer, accompanying with theoretically derived initialization. In-depth theoretical analysis shows that model updates can be bounded in a stable way. The proposed method combines the best of two worlds, i.e., good performance of Post-LN and stable training of Pre-LN, making DeepNorm a preferred alternative. We successfully scale Transformers up to 1,000 layers (i.e., 2,500 attention and feed-forward network sublayers) without difficulty, which is one order of magnitude deeper than previous deep Transformers. Remarkably, on a multilingual benchmark with 7,482 translation directions, our 200-layer model with 3.2B parameters significantly outperforms the 48-layer state-of-the-art model with 12B parameters by 5 BLEU points, which indicates a promising scaling direction. ... : Work in progress ...
|
|
Keyword:
Computation and Language cs.CL; FOS Computer and information sciences; Machine Learning cs.LG
|
|
URL: https://dx.doi.org/10.48550/arxiv.2203.00555 https://arxiv.org/abs/2203.00555
|
|
BASE
|
|
Hide details
|
|
Page: 1 2 3 4 5 6 7 8... 686
|
|