1 |
CPM-2: Large-scale Cost-effective Pre-trained Language Models ...
|
|
|
|
BASE
|
|
Show details
|
|
2 |
Sub-Character Tokenization for Chinese Pretrained Language Models ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Better Robustness by More Coverage: Adversarial and Mixup Data Augmentation for Robust Finetuning ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|