1 |
Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost
|
|
|
|
In: ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics ; https://hal.archives-ouvertes.fr/hal-03613101 ; ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, May 2022, Dublin, Ireland (2022)
|
|
Abstract:
International audience ; State-of-the-art NLP systems represent inputs with word embeddings, but these are brittle when faced with Out-of-Vocabulary (OOV) words. To address this issue, we follow the principle of mimick-like models to generate vectors for unseen words, by learning the behavior of pre-trained embeddings using only the surface form of words. We present a simple contrastive learning framework, LOVE, which extends the word representation of an existing pre-trained language model (such as BERT), and makes it robust to OOV with few additional parameters. Extensive evaluations demonstrate that our lightweight model achieves similar or even better performances than prior competitors, both on original datasets and on corrupted variants. Moreover, it can be used in a plug-and-play fashion with FastText and BERT, where it significantly improves their robustness.
|
|
Keyword:
[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]; Language models; Out-of-vocabulary OOV words; Word embeddings
|
|
URL: https://hal.archives-ouvertes.fr/hal-03613101/file/OOV_Problem.pdf https://hal.archives-ouvertes.fr/hal-03613101 https://hal.archives-ouvertes.fr/hal-03613101/document
|
|
BASE
|
|
Hide details
|
|
2 |
Imputing out-of-vocabulary embeddings with LOVE makes language models robust with little cost
|
|
|
|
In: ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics ; https://hal.archives-ouvertes.fr/hal-03613101 ; ACL 2022 - 60th Annual Meeting of the Association for Computational Linguistics, May 2022, Dublin, Ireland (2022)
|
|
BASE
|
|
Show details
|
|
3 |
Verwendung von Wissensgraphen zur inhaltlichen Ergänzung kleinerer Textkorpora ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Verwendung von Wissensgraphen zur inhaltlichen Ergänzung kleinerer Textkorpora ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Semantische Suche mit Word Embeddings für ein mehrsprachiges Wörterbuchportal ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Semantische Suche mit Word Embeddings für ein mehrsprachiges Wörterbuchportal ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Semantische Suche mit Word Embeddings für ein mehrsprachiges Wörterbuchportal ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Semantische Suche mit Word Embeddings für ein mehrsprachiges Wörterbuchportal ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Measuring and Comparing Social Bias in Static and Contextual Word Embeddings
|
|
|
|
In: Dissertations (2022)
|
|
BASE
|
|
Show details
|
|
11 |
A Generative Model for Topic Discovery and Polysemy Embeddings on Directed Attributed Networks
|
|
|
|
In: Symmetry; Volume 14; Issue 4; Pages: 703 (2022)
|
|
BASE
|
|
Show details
|
|
12 |
A Method of Short Text Representation Fusion with Weighted Word Embeddings and Extended Topic Information
|
|
|
|
In: Sensors; Volume 22; Issue 3; Pages: 1066 (2022)
|
|
BASE
|
|
Show details
|
|
13 |
Emotion and Reason in Political Language
|
|
|
|
In: The Economic Journal, 132 (643) (2022)
|
|
BASE
|
|
Show details
|
|
14 |
When Classifying Arguments, BERT Doesn't Care About Word Order. Except When It Matters
|
|
|
|
In: Proceedings of the Society for Computation in Linguistics (2022)
|
|
BASE
|
|
Show details
|
|
15 |
Tackling Morphological Analogies Using Deep Learning -- Extended Version
|
|
|
|
In: https://hal.inria.fr/hal-03425776 ; 2021 (2021)
|
|
BASE
|
|
Show details
|
|
16 |
About Neural Networks and Writing Definitions
|
|
|
|
In: ISSN: 2160-5076 ; Dictionaries: Journal of the Dictionary Society of North America Dictionary Society of North America ; https://hal.archives-ouvertes.fr/hal-03547452 ; Dictionaries: Journal of the Dictionary Society of North America Dictionary Society of North America, 2021, 42 (2), ⟨10.1353/dic.2021.0022⟩ (2021)
|
|
BASE
|
|
Show details
|
|
17 |
Models of diachronic semantic change using word embeddings ; Modèles diachroniques à base de plongements de mot pour l'analyse du changement sémantique
|
|
|
|
In: https://tel.archives-ouvertes.fr/tel-03199801 ; Document and Text Processing. Université Paris-Saclay, 2021. English. ⟨NNT : 2021UPASG006⟩ (2021)
|
|
BASE
|
|
Show details
|
|
18 |
Multilingual word embeddings and low resources: identifying influence in Antiquity
|
|
|
|
In: JADH 2021 ; JADH 2021 “Digital Humanities and COVID-19” ; https://hal.archives-ouvertes.fr/hal-03340641 ; JADH 2021 “Digital Humanities and COVID-19”, Organizing Committee, Japanese Association for Digital Humanities, Sep 2021, Tokyo, Japan. pp.51-54 ; https://www.hi.u-tokyo.ac.jp/ (2021)
|
|
BASE
|
|
Show details
|
|
19 |
Injecting Inductive Biases into Distributed Representations of Text ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|