1 |
Integrating Vectorized Lexical Constraints for Neural Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
2 |
Contextual Semantic-Guided Entity-Centric GCN for Relation Extraction
|
|
|
|
In: Mathematics; Volume 10; Issue 8; Pages: 1344 (2022)
|
|
BASE
|
|
Show details
|
|
3 |
Virtual Reality-Integrated Immersion-Based Teaching to English Language Learning Outcome
|
|
|
|
In: Front Psychol (2022)
|
|
BASE
|
|
Show details
|
|
5 |
Alternated Training with Synthetic and Authentic Data for Neural Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
CPM-2: Large-scale Cost-effective Pre-trained Language Models ...
|
|
Zhang, Zhengyan; Gu, Yuxian; Han, Xu; Chen, Shengqi; Xiao, Chaojun; Sun, Zhenbo; Yao, Yuan; Qi, Fanchao; Guan, Jian; Ke, Pei; Cai, Yanzheng; Zeng, Guoyang; Tan, Zhixing; Liu, Zhiyuan; Huang, Minlie; Han, Wentao; Liu, Yang; Zhu, Xiaoyan; Sun, Maosong. - : arXiv, 2021
|
|
Abstract:
In recent years, the size of pre-trained language models (PLMs) has grown by leaps and bounds. However, efficiency issues of these large-scale PLMs limit their utilization in real-world scenarios. We present a suite of cost-effective techniques for the use of PLMs to deal with the efficiency issues of pre-training, fine-tuning, and inference. (1) We introduce knowledge inheritance to accelerate the pre-training process by exploiting existing PLMs instead of training models from scratch. (2) We explore the best practice of prompt tuning with large-scale PLMs. Compared with conventional fine-tuning, prompt tuning significantly reduces the number of task-specific parameters. (3) We implement a new inference toolkit, namely InfMoE, for using large-scale PLMs with limited computational resources. Based on our cost-effective pipeline, we pre-train two models: an encoder-decoder bilingual model with 11 billion parameters (CPM-2) and its corresponding MoE version with 198 billion parameters. In our experiments, we ...
|
|
Keyword:
Computation and Language cs.CL; FOS Computer and information sciences
|
|
URL: https://arxiv.org/abs/2106.10715 https://dx.doi.org/10.48550/arxiv.2106.10715
|
|
BASE
|
|
Hide details
|
|
7 |
VISITRON: Visual Semantics-Aligned Interactively Trained Object-Navigator ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Assessing Multilingual Fairness in Pre-trained Multimodal Representations ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Dialog{S}um: {A} Real-Life Scenario Dialogue Summarization Dataset ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Transfer Learning for Sequence Generation: from Single-source to Multi-source ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
Segment, Mask, and Predict: Augmenting Chinese Word Segmentation with Self-Supervision ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Learning to Selectively Learn for Weakly-supervised Paraphrase Generation ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
SWSR: A Chinese Dataset and Lexicon for Online Sexism Detection ...
|
|
|
|
BASE
|
|
Show details
|
|
15 |
Analyzing the Limits of Self-Supervision in Handling Bias in Language ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Statistically significant detection of semantic shifts using contextual word embeddings ...
|
|
|
|
BASE
|
|
Show details
|
|
17 |
SWSR: A Chinese Dataset and Lexicon for Online Sexism Detection ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Statistically Significant Detection of Semantic Shifts using Contextual Word Embeddings ...
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Leveraging Word-Formation Knowledge for Chinese Word Sense Disambiguation ...
|
|
|
|
BASE
|
|
Show details
|
|
20 |
SWSR: A Chinese Dataset and Lexicon for Online Sexism Detection ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|