DE eng

Search in the Catalogues and Directories

Page: 1 2
Hits 1 – 20 of 34

1
Generating Authentic Adversarial Examples beyond Meaning-preserving with Doubly Round-trip Translation ...
Lai, Siyu; Yang, Zhen; Meng, Fandong. - : arXiv, 2022
BASE
Show details
2
Conditional Bilingual Mutual Information Based Adaptive Training for Neural Machine Translation ...
BASE
Show details
3
MSCTD: A Multimodal Sentiment Chat Translation Dataset ...
BASE
Show details
4
ClidSum: A Benchmark Dataset for Cross-Lingual Dialogue Summarization ...
Wang, Jiaan; Meng, Fandong; Lu, Ziyao. - : arXiv, 2022
BASE
Show details
5
A Survey on Cross-Lingual Summarization ...
Wang, Jiaan; Meng, Fandong; Zheng, Duo. - : arXiv, 2022
BASE
Show details
6
Bilingual Mutual Information Based Adaptive Training for Neural Machine Translation ...
BASE
Show details
7
Modeling Bilingual Conversational Characteristics for Neural Chat Translation ...
BASE
Show details
8
Sequence-Level Training for Non-Autoregressive Neural Machine Translation ...
BASE
Show details
9
Competence-based Curriculum Learning for Multilingual Machine Translation ...
BASE
Show details
10
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts ...
BASE
Show details
11
Marginal Utility Diminishes: Exploring the Minimum Knowledge for BERT Knowledge Distillation ...
Abstract: Read paper: https://www.aclanthology.org/2021.acl-long.228 Abstract: Recently, knowledge distillation (KD) has shown great success in BERT compression. Instead of only learning from the teacher's soft label as in conventional KD, researchers find that the rich information contained in the hidden layers of BERT is conducive to the student's performance. To better exploit the hidden knowledge, a common practice is to force the student to deeply mimic the teacher's hidden states of all the tokens in a layer-wise manner. In this paper, however, we observe that although distilling the teacher's hidden state knowledge (HSK) is helpful, the performance gain (marginal utility) diminishes quickly as more HSK is distilled. To understand this effect, we conduct a series of analysis. Specifically, we divide the HSK of BERT into three dimensions, namely depth, length and width. We first investigate a variety of strategies to extract crucial knowledge for each single dimension and then jointly compress the three ...
Keyword: Computational Linguistics; Condensed Matter Physics; Deep Learning; Electromagnetism; FOS Physical sciences; Information and Knowledge Engineering; Neural Network; Semantics
URL: https://underline.io/lecture/25558-marginal-utility-diminishes-exploring-the-minimum-knowledge-for-bert-knowledge-distillation
https://dx.doi.org/10.48448/t87s-3224
BASE
Hide details
12
CLEVE: Contrastive Pre-training for Event Extraction ...
BASE
Show details
13
Towards Making the Most of Dialogue Characteristics for Neural Chat Translation ...
BASE
Show details
14
Rethinking Stealthiness of Backdoor Attack against NLP Models ...
BASE
Show details
15
Prevent the Language Model from being Overconfident in Neural Machine Translation ...
BASE
Show details
16
KACC: A Multi-task Benchmark for Knowledge Abstraction, Concretization and Completion ...
BASE
Show details
17
Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification ...
BASE
Show details
18
RAP: Robustness-Aware Perturbations for Defending against Backdoor Attacks on NLP Models ...
BASE
Show details
19
Modeling Bilingual Conversational Characteristics for Neural Chat Translation ...
BASE
Show details
20
Target-oriented Fine-tuning for Zero-Resource Named Entity Recognition ...
BASE
Show details

Page: 1 2

Catalogues
0
0
1
0
1
0
0
Bibliographies
1
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
31
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern