DE eng

Search in the Catalogues and Directories

Hits 1 – 16 of 16

1
SMDT: Selective Memory-Augmented Neural Document Translation ...
Zhang, Xu; Yang, Jian; Huang, Haoyang. - : arXiv, 2022
BASE
Show details
2
StableMoE: Stable Routing Strategy for Mixture of Experts ...
Dai, Damai; Dong, Li; Ma, Shuming. - : arXiv, 2022
BASE
Show details
3
DeepNet: Scaling Transformers to 1,000 Layers ...
Wang, Hongyu; Ma, Shuming; Dong, Li. - : arXiv, 2022
BASE
Show details
4
Zero-shot Cross-lingual Transfer of Prompt-based Tuning with a Unified Multilingual Prompt ...
BASE
Show details
5
On the Representation Collapse of Sparse Mixture of Experts ...
Chi, Zewen; Dong, Li; Huang, Shaohan. - : arXiv, 2022
BASE
Show details
6
A Unified Strategy for Multilingual Grammatical Error Correction with Pre-trained Cross-Lingual Language Model ...
Sun, Xin; Ge, Tao; Ma, Shuming. - : arXiv, 2022
BASE
Show details
7
Towards Making the Most of Multilingual Pretraining for Zero-Shot Neural Machine Translation ...
Chen, Guanhua; Ma, Shuming; Chen, Yun. - : arXiv, 2021
BASE
Show details
8
Zero-shot Cross-lingual Transfer of Neural Machine Translation with Multilingual Pretrained Encoders ...
Chen, Guanhua; Ma, Shuming; Chen, Yun. - : arXiv, 2021
BASE
Show details
9
MT6: Multilingual Pretrained Text-to-Text Transformer with Translation Pairs ...
Chi, Zewen; Dong, Li; Ma, Shuming. - : arXiv, 2021
BASE
Show details
10
Multilingual Machine Translation Systems from Microsoft for WMT21 Shared Task ...
BASE
Show details
11
DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders ...
Ma, Shuming; Dong, Li; Huang, Shaohan. - : arXiv, 2021
BASE
Show details
12
XLM-E: Cross-lingual Language Model Pre-training via ELECTRA ...
Chi, Zewen; Huang, Shaohan; Dong, Li. - : arXiv, 2021
BASE
Show details
13
How Does Distilled Data Complexity Impact the Quality and Confidence of Non-Autoregressive Machine Translation? ...
Abstract: While non-autoregressive (NAR) models are showing great promise for machine translation, their use is limited by their dependence on knowledge distillation from autoregressive models. To address this issue, we seek to understand why distillation is so effective. Prior work suggests that distilled training data is less complex than manual translations. Based on experiments with the Levenshtein Transformer and the Mask-Predict NAR models on the WMT14 German-English task, this paper shows that different types of complexity have different impacts: while reducing lexical diversity and decreasing reordering complexity both help NAR learn better alignment between source and target, and thus improve translation quality, lexical diversity is the main reason why distillation increases model confidence, which affects the calibration of different NAR models differently. ... : Findings of ACL 2021 ...
Keyword: Computation and Language cs.CL; FOS Computer and information sciences
URL: https://arxiv.org/abs/2105.12900
https://dx.doi.org/10.48550/arxiv.2105.12900
BASE
Hide details
14
XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders ...
BASE
Show details
15
Deconvolution-Based Global Decoding for Neural Machine Translation ...
Lin, Junyang; Sun, Xu; Ren, Xuancheng. - : arXiv, 2018
BASE
Show details
16
A Semantic Relevance Based Neural Network for Text Summarization and Text Simplification ...
Ma, Shuming; Sun, Xu. - : arXiv, 2017
BASE
Show details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
16
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern