DE eng

Search in the Catalogues and Directories

Hits 1 – 16 of 16

1
Evaluating Multiway Multilingual NMT in the Turkic Languages ...
BASE
Show details
2
Findings of the WMT 2021 Shared Task on Quality Estimation ...
BASE
Show details
3
Pushing the Right Buttons: Adversarial Evaluation of Quality Estimation ...
BASE
Show details
4
Multilingual Domain Adaptation for NMT: Decoupling Language and Domain Information with Adapters ...
BASE
Show details
5
Robust Open-Vocabulary Translation from Visual Text Representations ...
BASE
Show details
6
Contrastive Learning for Context-aware Neural Machine Translation Using Coreference Information ...
BASE
Show details
7
To Ship or Not to Ship: An Extensive Evaluation of Automatic Metrics for Machine Translation ...
BASE
Show details
8
Identifying the Importance of Content Overlap for Better Cross-lingual Embedding Mappings ...
BASE
Show details
9
Simultaneous Neural Machine Translation with Constituent Label Prediction ...
BASE
Show details
10
Just Ask! Evaluating Machine Translation by Asking and Answering Questions ...
BASE
Show details
11
An Analysis of Euclidean vs. Graph-Based Framing for Bilingual Lexicon Induction from Word Embedding Spaces ...
BASE
Show details
12
Findings of the WMT Shared Task on Machine Translation Using Terminologies ...
BASE
Show details
13
Translation Transformers Rediscover Inherent Data Domains ...
BASE
Show details
14
Phrase-level Active Learning for Neural Machine Translation ...
BASE
Show details
15
A Fine-Grained Analysis of BERTScore ...
BASE
Show details
16
Wine is not v i n. On the Compatibility of Tokenizations across Languages ...
Abstract: The size of the vocabulary is a central design choice in large pretrained language models, with respect to both performance and memory requirements. Typically, subword tokenization algorithms such as byte pair encoding and WordPiece are used. In this work, we investigate the compatibility of tokenizations for multilingual static and contextualized embedding spaces and propose a measure that reflects the compatibility of tokenizations across languages. Our goal is to prevent incompatible tokenizations, e.g., "wine" (word-level) in English vs. "v i n" (character-level) in French, which make it hard to learn good multilingual semantic representations. We show that our compatibility measure allows the system designer to create vocabularies across languages that are compatible -- a desideratum that so far has been neglected in multilingual models. ...
Keyword: Bilingual Lexicon Induction; Language Models; Natural Language Processing
URL: https://dx.doi.org/10.48448/4bn9-4p23
https://underline.io/lecture/38413-wine-is-not-v-i-n.-on-the-compatibility-of-tokenizations-across-languages
BASE
Hide details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
16
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern