DE eng

Search in the Catalogues and Directories

Hits 1 – 6 of 6

1
FLAVA: A Foundational Language And Vision Alignment Model ...
Abstract: State-of-the-art vision and vision-and-language models rely on large-scale visio-linguistic pretraining for obtaining good performance on a variety of downstream tasks. Generally, such models are often either cross-modal (contrastive) or multi-modal (with earlier fusion) but not both; and they often only target specific modalities or tasks. A promising direction would be to use a single holistic universal model, as a "foundation", that targets all modalities at once -- a true vision and language foundation model should be good at vision tasks, language tasks, and cross- and multi-modal vision and language tasks. We introduce FLAVA as such a model and demonstrate impressive performance on a wide range of 35 tasks spanning these target modalities. ... : 18 pages ...
Keyword: Computation and Language cs.CL; Computer Vision and Pattern Recognition cs.CV; FOS Computer and information sciences
URL: https://dx.doi.org/10.48550/arxiv.2112.04482
https://arxiv.org/abs/2112.04482
BASE
Hide details
2
Emergent Linguistic Phenomena in Multi-Agent Communication Games ...
BASE
Show details
3
Countering Language Drift via Visual Grounding ...
BASE
Show details
4
Emergent Translation in Multi-Agent Communication ...
BASE
Show details
5
Virtual Embodiment: A Scalable Long-Term Strategy for Artificial Intelligence Research ...
BASE
Show details
6
HyperLex: A Large-Scale Evaluation of Graded Lexical Entailment ...
BASE
Show details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
6
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern