Page: 1 2 3 4 5 6 7 8 9... 690
81 |
Adapting BigScience Multilingual Model to Unseen Languages ...
|
|
|
|
BASE
|
|
Show details
|
|
82 |
On Efficiently Acquiring Annotations for Multilingual Models ...
|
|
|
|
BASE
|
|
Show details
|
|
83 |
Team ÚFAL at CMCL 2022 Shared Task: Figuring out the correct recipe for predicting Eye-Tracking features using Pretrained Language Models ...
|
|
|
|
BASE
|
|
Show details
|
|
84 |
Does Corpus Quality Really Matter for Low-Resource Languages? ...
|
|
|
|
BASE
|
|
Show details
|
|
85 |
IIITDWD-ShankarB@ Dravidian-CodeMixi-HASOC2021: mBERT based model for identification of offensive content in south Indian languages ...
|
|
|
|
BASE
|
|
Show details
|
|
86 |
mSLAM: Massively multilingual joint pre-training for speech and text ...
|
|
|
|
BASE
|
|
Show details
|
|
87 |
On the Representation Collapse of Sparse Mixture of Experts ...
|
|
Chi, Zewen; Dong, Li; Huang, Shaohan; Dai, Damai; Ma, Shuming; Patra, Barun; Singhal, Saksham; Bajaj, Payal; Song, Xia; Wei, Furu. - : arXiv, 2022
|
|
Abstract:
Sparse mixture of experts provides larger model capacity while requiring a constant computational overhead. It employs the routing mechanism to distribute input tokens to the best-matched experts according to their hidden representations. However, learning such a routing mechanism encourages token clustering around expert centroids, implying a trend toward representation collapse. In this work, we propose to estimate the routing scores between tokens and experts on a low-dimensional hypersphere. We conduct extensive experiments on cross-lingual language model pre-training and fine-tuning on downstream tasks. Experimental results across seven multilingual benchmarks show that our method achieves consistent gains. We also present a comprehensive analysis on the representation and routing behaviors of our models. Our method alleviates the representation collapse issue and achieves more consistent routing than the baseline mixture-of-experts methods. ...
|
|
Keyword:
Computation and Language cs.CL; FOS Computer and information sciences; Machine Learning cs.LG
|
|
URL: https://dx.doi.org/10.48550/arxiv.2204.09179 https://arxiv.org/abs/2204.09179
|
|
BASE
|
|
Hide details
|
|
88 |
Politics and Virality in the Time of Twitter: A Large-Scale Cross-Party Sentiment Analysis in Greece, Spain and United Kingdom ...
|
|
|
|
BASE
|
|
Show details
|
|
89 |
L3Cube-MahaHate: A Tweet-based Marathi Hate Speech Detection Dataset and BERT models ...
|
|
|
|
BASE
|
|
Show details
|
|
90 |
Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of Code-Mixed Clinical Texts ...
|
|
|
|
BASE
|
|
Show details
|
|
91 |
A Unified Strategy for Multilingual Grammatical Error Correction with Pre-trained Cross-Lingual Language Model ...
|
|
|
|
BASE
|
|
Show details
|
|
92 |
A New Generation of Perspective API: Efficient Multilingual Character-level Transformers ...
|
|
|
|
BASE
|
|
Show details
|
|
93 |
Factual Consistency of Multilingual Pretrained Language Models ...
|
|
|
|
BASE
|
|
Show details
|
|
94 |
Examining Scaling and Transfer of Language Model Architectures for Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
95 |
MuMiN: A Large-Scale Multilingual Multimodal Fact-Checked Misinformation Social Network Dataset ...
|
|
|
|
BASE
|
|
Show details
|
|
96 |
Mono vs Multilingual BERT for Hate Speech Detection and Text Classification: A Case Study in Marathi ...
|
|
|
|
BASE
|
|
Show details
|
|
100 |
From Examples to Rules: Neural Guided Rule Synthesis for Information Extraction ...
|
|
|
|
BASE
|
|
Show details
|
|
Page: 1 2 3 4 5 6 7 8 9... 690
|
|