DE eng

Search in the Catalogues and Directories

Hits 1 – 13 of 13

1
Pathologies of Pre-trained Language Models in Few-shot Fine-tuning ...
BASE
Show details
2
MetaXL: Meta Representation Transformation for Low-resource Cross-lingual Learning ...
BASE
Show details
3
A Dataset and Baselines for Multilingual Reply Suggestion ...
BASE
Show details
4
A Conditional Generative Matching Model for Multi-lingual Reply Suggestion ...
BASE
Show details
5
XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation ...
BASE
Show details
6
Say `YES' to Positivity: Detecting Toxic Language in Workplace Communications ...
BASE
Show details
7
A Conditional Generative Matching Model for Multi-lingual Reply Suggestion ...
BASE
Show details
8
Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer ...
BASE
Show details
9
XtremeDistil: Multi-stage Distillation for Massive Multilingual Models ...
Abstract: Deep and large pre-trained language models are the state-of-the-art for various natural language processing tasks. However, the huge size of these models could be a deterrent to use them in practice. Some recent and concurrent works use knowledge distillation to compress these huge models into shallow ones. In this work we study knowledge distillation with a focus on multi-lingual Named Entity Recognition (NER). In particular, we study several distillation strategies and propose a stage-wise optimization scheme leveraging teacher internal representations that is agnostic of teacher architecture and show that it outperforms strategies employed in prior works. Additionally, we investigate the role of several factors like the amount of unlabeled data, annotation resources, model architecture and inference latency to name a few. We show that our approach leads to massive compression of MBERT-like teacher models by upto 35x in terms of parameters and 51x in terms of latency for batch inference while retaining 95% ... : To appear in ACL 2020 ...
Keyword: Computation and Language cs.CL; FOS Computer and information sciences; Machine Learning cs.LG
URL: https://arxiv.org/abs/2004.05686
https://dx.doi.org/10.48550/arxiv.2004.05686
BASE
Hide details
10
Smart To-Do : Automatic Generation of To-Do Items from Emails ...
BASE
Show details
11
Distilling BERT into Simple Neural Networks with Unlabeled Transfer Data ...
BASE
Show details
12
Multi-Source Cross-Lingual Model Transfer: Learning What to Share ...
BASE
Show details
13
Identifying Roles in Social Networks using Linguistic Analysis.
BASE
Show details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
13
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern