1 |
Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP
|
|
|
|
In: https://hal.inria.fr/hal-03540069 ; 2022 (2022)
|
|
BASE
|
|
Show details
|
|
2 |
Documenting Geographically and Contextually Diverse Data Sources: The BigScience Catalogue of Language Data and Resources
|
|
|
|
In: https://hal.inria.fr/hal-03550289 ; 2022 (2022)
|
|
BASE
|
|
Show details
|
|
3 |
Masader: Metadata Sourcing for Arabic Text and Speech Data Resources ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Evaluating Various Tokenizers for Arabic Text Classification ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|