3 |
Listening to Affected Communities to Define Extreme Speech: Dataset and Experiments ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Differentiable Multi-Agent Actor-Critic for Multi-Step Radiology Report Summarization ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Graph Algorithms for Multiparallel Word Alignment
|
|
|
|
In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing ; The 2021 Conference on Empirical Methods in Natural Language Processing ; https://hal.archives-ouvertes.fr/hal-03424044 ; The 2021 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Nov 2021, Punta Cana, Dominica ; https://2021.emnlp.org/ (2021)
|
|
BASE
|
|
Show details
|
|
11 |
Does He Wink or Does He Nod? A Challenging Benchmark for Evaluating Word Understanding of Language Models ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
Superbizarre Is Not Superb: Derivational Morphology Improves BERT's Interpretation of Complex Words ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
ParCourE: A Parallel Corpus Explorer for a Massively Multilingual Corpus ...
|
|
|
|
BASE
|
|
Show details
|
|
15 |
Multilingual LAMA: Investigating Knowledge in Multilingual Pretrained Language Models ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Wine is Not v i n. -- On the Compatibility of Tokenizations Across Languages ...
|
|
|
|
Abstract:
The size of the vocabulary is a central design choice in large pretrained language models, with respect to both performance and memory requirements. Typically, subword tokenization algorithms such as byte pair encoding and WordPiece are used. In this work, we investigate the compatibility of tokenizations for multilingual static and contextualized embedding spaces and propose a measure that reflects the compatibility of tokenizations across languages. Our goal is to prevent incompatible tokenizations, e.g., "wine" (word-level) in English vs.\ "v i n" (character-level) in French, which make it hard to learn good multilingual semantic representations. We show that our compatibility measure allows the system designer to create vocabularies across languages that are compatible -- a desideratum that so far has been neglected in multilingual models. ... : Accepted at EMNLP 2021 Findings ...
|
|
Keyword:
Computation and Language cs.CL; FOS Computer and information sciences
|
|
URL: https://dx.doi.org/10.48550/arxiv.2109.05772 https://arxiv.org/abs/2109.05772
|
|
BASE
|
|
Hide details
|
|
18 |
Locating Language-Specific Information in Contextualized Embeddings ...
|
|
|
|
BASE
|
|
Show details
|
|
20 |
Measuring and Improving Consistency in Pretrained Language Models ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|