Page: 1 2 3 4 5 6 7... 212
41 |
EnvEdit: Environment Editing for Vision-and-Language Navigation ...
|
|
|
|
BASE
|
|
Show details
|
|
42 |
IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages ...
|
|
|
|
BASE
|
|
Show details
|
|
45 |
IterVM: Iterative Vision Modeling Module for Scene Text Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
46 |
AnyFace: Free-style Text-to-Face Synthesis and Manipulation ...
|
|
|
|
BASE
|
|
Show details
|
|
49 |
Local-Global Context Aware Transformer for Language-Guided Video Segmentation ...
|
|
|
|
BASE
|
|
Show details
|
|
50 |
Unsupervised Vision-Language Parsing: Seamlessly Bridging Visual Scene Graphs with Language Structures via Dependency Relationships ...
|
|
|
|
BASE
|
|
Show details
|
|
53 |
IR-GAN: Image Manipulation with Linguistic Instruction by Increment Reasoning ...
|
|
|
|
BASE
|
|
Show details
|
|
55 |
AGQA 2.0: An Updated Benchmark for Compositional Spatio-Temporal Reasoning ...
|
|
|
|
BASE
|
|
Show details
|
|
56 |
Self-supervised 3D Semantic Representation Learning for Vision-and-Language Navigation ...
|
|
|
|
BASE
|
|
Show details
|
|
57 |
Domain Adaptation Meets Zero-Shot Learning: An Annotation-Efficient Approach to Multi-Modality Medical Image Segmentation ...
|
|
|
|
BASE
|
|
Show details
|
|
58 |
Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence Learning ...
|
|
|
|
BASE
|
|
Show details
|
|
59 |
Uni-EDEN: Universal Encoder-Decoder Network by Multi-Granular Vision-Language Pre-training ...
|
|
|
|
BASE
|
|
Show details
|
|
60 |
Optimized latent-code selection for explainable conditional text-to-image GANs ...
|
|
|
|
BASE
|
|
Show details
|
|
Page: 1 2 3 4 5 6 7... 212
|
|