1 |
Improving Word Translation via Two-Stage Contrastive Learning ...
|
|
|
|
BASE
|
|
Show details
|
|
2 |
Plan-then-Generate: Controlled Data-to-Text Generation via Planning ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Prix-LM: Pretraining for Multilingual Knowledge Base Construction ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Learning Domain-Specialised Representations for Cross-Lingual Biomedical Entity Linking ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
MirrorWiC: On Eliciting Word-in-Context Representations from Pretrained Language Models ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
MirrorWiC: On Eliciting Word-in-Context Representations from Pretrained Language Models ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Visually Grounded Reasoning across Languages and Cultures ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Fast, Effective, and Self-Supervised: Transforming Masked Language Models into Universal Lexical and Sentence Encoders ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Visually Grounded Reasoning across Languages and Cultures ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Fast, Effective, and Self-Supervised: Transforming Masked Language Models into Universal Lexical and Sentence Encoders ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Self-Alignment Pretraining for Biomedical Entity Representations
|
|
|
|
Abstract:
Despite the widespread success of self-supervised learning via masked language models (MLM), accurately capturing fine-grained semantic relationships in the biomedical domain remains a challenge. This is of paramount importance for entity-level tasks such as entity linking where the ability to model entity relations (especially synonymy) is pivotal. To address this challenge, we propose SapBERT, a pretraining scheme that self-aligns the representation space of biomedical entities. We design a scalable metric learning framework that can leverage UMLS, a massive collection of biomedical ontologies with 4M+ concepts. In contrast with previous pipeline-based hybrid systems, SapBERT offers an elegant one-model-for-all solution to the problem of medical entity linking (MEL), achieving a new state-of-the-art (SOTA) on six MEL benchmarking datasets. In the scientific domain, we achieve SOTA even without task-specific supervision. With substantial improvement over various domain-specific pretrained MLMs such as BioBERT, SciBERTand and PubMedBERT, our pretraining scheme proves to be both effective and robust. ; FL is supported by Grace & Thomas C.H. Chan Cambridge Scholarship. NC and MB would like to acknowledge funding from Health Data Research UK as part of the National Text Analytics project.
|
|
URL: https://doi.org/10.17863/CAM.72095 https://www.repository.cam.ac.uk/handle/1810/324645
|
|
BASE
|
|
Hide details
|
|
12 |
Large-scale exploration of neural relation classification architectures ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Will-They-Won't-They: A Very Large Dataset for Stance Detection on Twitter ...
|
|
|
|
BASE
|
|
Show details
|
|
15 |
Will-They-Won't-They: A Very Large Dataset for Stance Detection on Twitter
|
|
|
|
BASE
|
|
Show details
|
|
16 |
STANDER: An expert-annotated dataset for news stance detection and evidence retrieval
|
|
Conforti, C; Berndt, J; Pilehvar, MT. - : Association for Computational Linguistics, 2020. : Findings of the Association for Computational Linguistics Findings of ACL: EMNLP 2020, 2020
|
|
BASE
|
|
Show details
|
|
17 |
Large-scale exploration of neural relation classification architectures
|
|
Le, HQ; Can, DC; Vu, ST. - : Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018, 2020
|
|
BASE
|
|
Show details
|
|
|
|