3 |
On the Copying Behaviors of Pre-Training for Neural Machine Translation ...
|
|
|
|
Abstract:
Read paper: https://www.aclanthology.org/2021.findings-acl.373 Abstract: Previous studies have shown that initializing neural machine translation (NMT) models with the pre-trained language models (LM) can speed up the model training and boost the model performance. In this work, we identify a critical side-effect of pre-training for NMT, which is due to the discrepancy between the training objectives of LM-based pre-training and NMT. Since the LM objective learns to reconstruct a few source tokens and copy most of them, the pre-training initialization would affect the copying behaviors of NMT models. We provide a quantitative analysis of copying behaviors by introducing a metric called copying ratio, which empirically shows that pre-training based NMT models have a larger copying ratio than the standard one. In response to this problem, we propose a simple and effective method named copying penalty to control the copying behaviors in decoding. Extensive experiments on both in-domain and out-of-domain ...
|
|
Keyword:
Computational Linguistics; Condensed Matter Physics; Deep Learning; Electromagnetism; FOS Physical sciences; Neural Network; Semantics
|
|
URL: https://underline.io/lecture/26464-on-the-copying-behaviors-of-pre-training-for-neural-machine-translation https://dx.doi.org/10.48448/94x1-rg05
|
|
BASE
|
|
Hide details
|
|
5 |
Norm-Based Curriculum Learning for Neural Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Shared-Private Bilingual Word Embeddings for Neural Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Unsupervised Neural Dialect Translation with Commonality and Diversity Modeling ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Towards Bidirectional Hierarchical Representations for Attention-Based Neural Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
A Relationship: Word Alignment, Phrase Table, and Translation Quality
|
|
|
|
BASE
|
|
Show details
|
|
12 |
iSentenizer-μ: Multilingual Sentence Boundary Detection Model
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Unsupervised Quality Estimation Model for English to German Translation and Its Application in Extensive Supervised Evaluation
|
|
|
|
BASE
|
|
Show details
|
|
14 |
A Systematic Comparison of Data Selection Criteria for SMT Domain Adaptation
|
|
|
|
BASE
|
|
Show details
|
|
15 |
Unsupervised Chunking Based on Graph Propagation from Bilingual Corpus
|
|
|
|
BASE
|
|
Show details
|
|
|
|