DE eng

Search in the Catalogues and Directories

Page: 1 2
Hits 1 – 20 of 32

1
Characterizing News Portrayal of Civil Unrest in Hong Kong, 1998–2020 ...
BASE
Show details
2
Grounded Neural Generation ...
BASE
Show details
3
ArgFuse: A Weakly-Supervised Framework for Document-Level Event Argument Aggregation ...
BASE
Show details
4
Question Answering over Text and Tables ...
BASE
Show details
5
Reordering Examples Helps during Priming-based Few-Shot Learning ...
BASE
Show details
6
One Teacher is Enough? Pre-trained Language Model Distillation from Multiple Teachers ...
Abstract: Pre-trained language models (PLMs) achieve great success in NLP. However, their huge model sizes hinder their applications in many practical systems. Knowledge distillation is a popular technique to compress PLMs, which learns a small student model from a large teacher PLM. However, the knowledge learned from a single teacher may be limited and even biased, resulting in low-quality student model. In this paper, we propose a multi-teacher knowledge distillation framework named MTBERT for pre-trained language model compression, which can train high-quality student model from multiple teacher PLMs. In MTBERT we design a multi-teacher co-finetuning method to jointly finetune multiple teacher PLMs in downstream tasks with shared pooling and prediction layers to align their output space for better collaborative teaching. In addition, we propose a multi-teacher hidden loss and a multi-teacher distillation loss to transfer the useful knowledge in both hidden states and soft labels from multiple teacher PLMs to the ...
Keyword: Computational Linguistics; Condensed Matter Physics; FOS Physical sciences; Information and Knowledge Engineering; Machine Learning; Neural Network; Semantics
URL: https://dx.doi.org/10.48448/nsy4-yg49
https://underline.io/lecture/29881-one-teacher-is-enoughquestion-pre-trained-language-model-distillation-from-multiple-teachers
BASE
Hide details
7
Extracting Events from Industrial Incident Reports ...
BASE
Show details
8
On the Distribution, Sparsity, and Inference-time Quantization of Attention Values in Transformers ...
BASE
Show details
9
Knowledge-based neural pre-training for Intelligent Document Management ...
AIxIA 2021; BASILI, Roberto. - : Underline Science Inc., 2021
BASE
Show details
10
Easy Semantification of Bioassays ...
AIxIA 2021; Anteghini, Marco. - : Underline Science Inc., 2021
BASE
Show details
11
Improving Machine Translation of Arabic Dialects through Multi-Task Learning ...
AIxIA 2021; moukafih, youness. - : Underline Science Inc., 2021
BASE
Show details
12
Automatic Learning Assistant in Telugu ...
BASE
Show details
13
Team “NoConflict” at CASE 2021 Task 1: Pretraining for Sentence-Level Protest Event Detection ...
BASE
Show details
14
DAAI at CASE 2021 Task 1: Transformer-based Multilingual Socio-political and Crisis Event Detection ...
BASE
Show details
15
Modality and Negation in Event Extraction ...
BASE
Show details
16
Hell Hath No Fury? Correcting Bias in the NRC Emotion Lexicon ...
BASE
Show details
17
System Description for the CommonGen task with the POINTER model ...
BASE
Show details
18
Compositional Lexical Semantics In Natural Language Inference
In: Publicly Accessible Penn Dissertations (2017)
BASE
Show details
19
A Uniform Approach to Analogies, Synonyms, Antonyms, and Associations
BASE
Show details
20
The Latent Relation Mapping Engine: Algorithm and Experiments [<Journal>]
Turney, Peter D.. - : AI Access Foundation
BASE
Show details

Page: 1 2

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
32
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern