1 |
Sememe Prediction for BabelNet Synsets using Multilingual and Multimodal Information ...
|
|
|
|
BASE
|
|
Show details
|
|
2 |
YACLC: A Chinese Learner Corpus with Multidimensional Annotation ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Alternated Training with Synthetic and Authentic Data for Neural Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
CPM-2: Large-scale Cost-effective Pre-trained Language Models ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Automatic Construction of Sememe Knowledge Bases via Dictionaries ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Sub-Character Tokenization for Chinese Pretrained Language Models ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Transfer Learning for Sequence Generation: from Single-source to Multi-source ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Segment, Mask, and Predict: Augmenting Chinese Word Segmentation with Self-Supervision ...
|
|
|
|
Abstract:
Anthology paper link: https://aclanthology.org/2021.emnlp-main.158/ Abstract: Recent state-of-the-art (SOTA) effective neural network methods and fine-tuning methods based on pre-trained models (PTM) have been used in Chinese word segmentation (CWS), and they achieve great results. However, previous works focus on training the models with the fixed corpus at every iteration. The intermediate generated information is also valuable. Besides, the robustness of the previous neural methods is limited by the large-scale annotated data. There are a few noises in the annotated corpus. Limited efforts have been made by previous studies to deal with such problems. In this work, we propose a self-supervised CWS approach with a straightforward and effective architecture. First, we train a word segmentation model and use it to generate the segmentation results. Then, we use a revised masked language model (MLM) to evaluate the quality of the segmentation results based on the predictions of the MLM. Finally, we leverage ...
|
|
Keyword:
Computational Linguistics; Language Models; Machine Learning; Machine Learning and Data Mining; Natural Language Processing; Neural Network
|
|
URL: https://underline.io/lecture/37641-segment,-mask,-and-predict-augmenting-chinese-word-segmentation-with-self-supervision https://dx.doi.org/10.48448/axyx-nt90
|
|
BASE
|
|
Hide details
|
|
12 |
OpenAttack: An Open-source Textual Adversarial Attack Toolkit ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Try to Substitute: An Unsupervised Chinese Word Sense Disambiguation Method Based on HowNet ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Lexical Sememe Prediction using Dictionary Definitions by Capturing Local Semantic Correspondence ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Improving Back-Translation with Uncertainty-based Confidence Estimation ...
|
|
|
|
BASE
|
|
Show details
|
|
17 |
Towards Building a Multilingual Sememe Knowledge Base: Predicting Sememes for BabelNet Synsets ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Modeling Semantic Compositionality with Sememe Knowledge ...
|
|
|
|
BASE
|
|
Show details
|
|
19 |
OpenHowNet: An Open Sememe-based Lexical Knowledge Base ...
|
|
|
|
BASE
|
|
Show details
|
|
20 |
Neural Machine Translation with Explicit Phrase Alignment ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|