DE eng

Search in the Catalogues and Directories

Hits 1 – 16 of 16

1
SMDT: Selective Memory-Augmented Neural Document Translation ...
Zhang, Xu; Yang, Jian; Huang, Haoyang. - : arXiv, 2022
BASE
Show details
2
StableMoE: Stable Routing Strategy for Mixture of Experts ...
Dai, Damai; Dong, Li; Ma, Shuming. - : arXiv, 2022
BASE
Show details
3
DeepNet: Scaling Transformers to 1,000 Layers ...
Abstract: In this paper, we propose a simple yet effective method to stabilize extremely deep Transformers. Specifically, we introduce a new normalization function (DeepNorm) to modify the residual connection in Transformer, accompanying with theoretically derived initialization. In-depth theoretical analysis shows that model updates can be bounded in a stable way. The proposed method combines the best of two worlds, i.e., good performance of Post-LN and stable training of Pre-LN, making DeepNorm a preferred alternative. We successfully scale Transformers up to 1,000 layers (i.e., 2,500 attention and feed-forward network sublayers) without difficulty, which is one order of magnitude deeper than previous deep Transformers. Remarkably, on a multilingual benchmark with 7,482 translation directions, our 200-layer model with 3.2B parameters significantly outperforms the 48-layer state-of-the-art model with 12B parameters by 5 BLEU points, which indicates a promising scaling direction. ... : Work in progress ...
Keyword: Computation and Language cs.CL; FOS Computer and information sciences; Machine Learning cs.LG
URL: https://dx.doi.org/10.48550/arxiv.2203.00555
https://arxiv.org/abs/2203.00555
BASE
Hide details
4
Zero-shot Cross-lingual Transfer of Prompt-based Tuning with a Unified Multilingual Prompt ...
BASE
Show details
5
On the Representation Collapse of Sparse Mixture of Experts ...
Chi, Zewen; Dong, Li; Huang, Shaohan. - : arXiv, 2022
BASE
Show details
6
A Unified Strategy for Multilingual Grammatical Error Correction with Pre-trained Cross-Lingual Language Model ...
Sun, Xin; Ge, Tao; Ma, Shuming. - : arXiv, 2022
BASE
Show details
7
Towards Making the Most of Multilingual Pretraining for Zero-Shot Neural Machine Translation ...
Chen, Guanhua; Ma, Shuming; Chen, Yun. - : arXiv, 2021
BASE
Show details
8
Zero-shot Cross-lingual Transfer of Neural Machine Translation with Multilingual Pretrained Encoders ...
Chen, Guanhua; Ma, Shuming; Chen, Yun. - : arXiv, 2021
BASE
Show details
9
MT6: Multilingual Pretrained Text-to-Text Transformer with Translation Pairs ...
Chi, Zewen; Dong, Li; Ma, Shuming. - : arXiv, 2021
BASE
Show details
10
Multilingual Machine Translation Systems from Microsoft for WMT21 Shared Task ...
BASE
Show details
11
DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders ...
Ma, Shuming; Dong, Li; Huang, Shaohan. - : arXiv, 2021
BASE
Show details
12
XLM-E: Cross-lingual Language Model Pre-training via ELECTRA ...
Chi, Zewen; Huang, Shaohan; Dong, Li. - : arXiv, 2021
BASE
Show details
13
How Does Distilled Data Complexity Impact the Quality and Confidence of Non-Autoregressive Machine Translation? ...
BASE
Show details
14
XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders ...
BASE
Show details
15
Deconvolution-Based Global Decoding for Neural Machine Translation ...
Lin, Junyang; Sun, Xu; Ren, Xuancheng. - : arXiv, 2018
BASE
Show details
16
A Semantic Relevance Based Neural Network for Text Summarization and Text Simplification ...
Ma, Shuming; Sun, Xu. - : arXiv, 2017
BASE
Show details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
16
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern