1 |
On the Representation Collapse of Sparse Mixture of Experts ...
|
|
|
|
BASE
|
|
Show details
|
|
2 |
Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Multilingual Machine Translation Systems from Microsoft for WMT21 Shared Task ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
XLM-E: Cross-lingual Language Model Pre-training via ELECTRA ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|