1 |
Charformer: Fast Character Transformers via Gradient-based Subword Tokenization ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Do Transformer Modifications Transfer Across Implementations and Applications? ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
A Simple and Effective Positional Encoding for Transformers ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Improving Multilingual Models with Language-Clustered Vocabularies ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Rethinking embedding coupling in pre-trained language models ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|