1 |
Evaluating Multilingual Text Encoders for Unsupervised Cross-Lingual Retrieval ...
|
|
|
|
BASE
|
|
Show details
|
|
2 |
Fast, Effective, and Self-Supervised: Transforming Masked Language Models into Universal Lexical and Sentence Encoders ...
|
|
|
|
Abstract:
Pretrained Masked Language Models (MLMs) have revolutionised NLP in recent years. However, previous work has indicated that off-the-shelf MLMs are not effective as universal lexical or sentence encoders without further task-specific fine-tuning on NLI, sentence similarity, or paraphrasing tasks using annotated task data. In this work, we demonstrate that it is possible to turn MLMs into effective universal lexical and sentence encoders even without any additional data and without any supervision. We propose an extremely simple, fast and effective contrastive learning technique, termed Mirror-BERT, which converts MLMs (e.g., BERT and RoBERTa) into such encoders in 20-30 seconds without any additional external knowledge. Mirror-BERT relies on fully identical or slightly modified string pairs as positive (i.e., synonymous) fine-tuning examples, and aims to maximise their similarity during identity fine-tuning. We report huge gains over off-the-shelf MLMs with Mirror-BERT in both lexical-level and sentence-level ...
|
|
Keyword:
cs.AI; cs.CL; cs.LG
|
|
URL: https://www.repository.cam.ac.uk/handle/1810/327954 https://dx.doi.org/10.17863/cam.75407
|
|
BASE
|
|
Hide details
|
|
4 |
Cross-lingual semantic specialization via lexical relation induction ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Adversarial propagation and zero-shot cross-lingual transfer of word vector specialization ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Do we really need fully unsupervised cross-lingual embeddings? ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
On the relation between linguistic typology and (limitations of) multilingual language modeling ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Cross-lingual semantic specialization via lexical relation induction
|
|
Ponti, Edoardo; Vulić, I; Glavaš, G. - : EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference, 2020
|
|
BASE
|
|
Show details
|
|
9 |
On the relation between linguistic typology and (limitations of) multilingual language modeling
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Adversarial propagation and zero-shot cross-lingual transfer of word vector specialization
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Do we really need fully unsupervised cross-lingual embeddings?
|
|
Vulić, I; Glavaš, G; Reichart, R. - : EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference, 2020
|
|
BASE
|
|
Show details
|
|
12 |
Towards zero-shot language modeling
|
|
Ponti, Edoardo; Vulić, I; Cotterell, R. - : EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference, 2020
|
|
BASE
|
|
Show details
|
|
14 |
Zero-shot language transfer for cross-lingual sentence retrieval using bidirectional attention model ...
|
|
|
|
BASE
|
|
Show details
|
|
15 |
Learning unsupervised multilingual word embeddings with incremental multilingual hubs ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Specializing distributional vectors of allwords for lexical entailment ...
|
|
|
|
BASE
|
|
Show details
|
|
17 |
Investigating cross-lingual alignment methods for contextualized embeddings with Token-level evaluation ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Specializing distributional vectors of allwords for lexical entailment
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Investigating cross-lingual alignment methods for contextualized embeddings with Token-level evaluation
|
|
|
|
BASE
|
|
Show details
|
|
20 |
Learning unsupervised multilingual word embeddings with incremental multilingual hubs
|
|
Heyman, G; Verreet, B; Vulić, I. - : NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, 2019
|
|
BASE
|
|
Show details
|
|
|
|