2 |
On the Representation Collapse of Sparse Mixture of Experts ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Multilingual Machine Translation Systems from Microsoft for WMT21 Shared Task ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
XLM-E: Cross-lingual Language Model Pre-training via ELECTRA ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Adapt-and-Distill: Developing Small, Fast and Effective Pretrained Language Models for Domains ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|