Page: 1... 17 18 19 20 21 22 23 24 25... 1.020
401 |
Modeling Transitions of Focal Entities for Conversational Knowledge Base Question Answering ...
|
|
|
|
BASE
|
|
Show details
|
|
402 |
One Semantic Parser to Parse Them All: Sequence to Sequence Multi-Task Learning on Semantic Parsing Datasets ...
|
|
|
|
BASE
|
|
Show details
|
|
403 |
Fine-Grained Spatial Information Extraction in Radiology as Two-turn Question Answering ...
|
|
|
|
BASE
|
|
Show details
|
|
406 |
The Curse of Dense Low-Dimensional Information Retrieval for Large Index Sizes ...
|
|
|
|
BASE
|
|
Show details
|
|
407 |
Hierarchy-aware Label Semantics Matching Network for Hierarchical Text Classification ...
|
|
|
|
BASE
|
|
Show details
|
|
408 |
Towards Zero-Shot Knowledge Distillation for Natural Language Processing ...
|
|
|
|
Abstract:
Anthology paper link: https://aclanthology.org/2021.emnlp-main.526/ Abstract: Knowledge distillation (KD) is a common knowledge transfer algorithm used for model compression across a variety of deep learning based natural language processing (NLP) solutions. In its regular manifestations, KD requires access to the teacher’s training data for knowledge transfer to the student network. However, privacy concerns, data regulations and proprietary reasons may prevent access to such data. We present, to the best of our knowledge, the first work on Zero-shot Knowledge Distillation for NLP, where the student learns from the much larger teacher without any task specific data. Our solution combines out-of-domain data and adversarial training to learn the teacher’s output distribution. We investigate six tasks from the GLUE benchmark and demonstrate that we can achieve between 75% and 92% of the teacher’s classification score (accuracy or F1) while compressing the model 30 times. ...
|
|
Keyword:
Computational Linguistics; Machine Learning; Machine Learning and Data Mining; Natural Language Processing
|
|
URL: https://underline.io/lecture/38084-towards-zero-shot-knowledge-distillation-for-natural-language-processing https://dx.doi.org/10.48448/yfd6-g336
|
|
BASE
|
|
Hide details
|
|
410 |
BERTAC: Enhancing Transformer-based Language Models with Adversarially Pretrained Convolutional Neural Networks ...
|
|
|
|
BASE
|
|
Show details
|
|
411 |
SIMMC 2.0: A Task-oriented Dialog Dataset for Immersive Multimodal Conversations ...
|
|
|
|
BASE
|
|
Show details
|
|
412 |
Automatic Text Evaluation through the Lens of Wasserstein Barycenters ...
|
|
|
|
BASE
|
|
Show details
|
|
413 |
Guiding the Growth: Difficulty-Controllable Question Generation through Step-by-Step Rewriting ...
|
|
|
|
BASE
|
|
Show details
|
|
414 |
PSED: A Dataset for Selecting Emphasis in Presentation Slides ...
|
|
|
|
BASE
|
|
Show details
|
|
416 |
Combining sentence and table evidence to predict veracity of factual claims using TaPaS and RoBERTa ...
|
|
|
|
BASE
|
|
Show details
|
|
417 |
Meta Distant Transfer Learning for Pre-trained Language Models ...
|
|
|
|
BASE
|
|
Show details
|
|
418 |
Generating SOAP Notes from Doctor-Patient Conversations Using Modular Summarization Techniques ...
|
|
|
|
BASE
|
|
Show details
|
|
419 |
Plot and Rework: Modeling Storylines for Visual Storytelling ...
|
|
|
|
BASE
|
|
Show details
|
|
420 |
SaRoCo: Detecting Satire in a Novel Romanian Corpus of News Articles ...
|
|
|
|
BASE
|
|
Show details
|
|
Page: 1... 17 18 19 20 21 22 23 24 25... 1.020
|
|