1 |
Local-Global Context Aware Transformer for Language-Guided Video Segmentation ...
|
|
|
|
BASE
|
|
Show details
|
|
2 |
Compositional Temporal Grounding with Structured Variational Cross-Graph Correspondence Learning ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Rethinking Cross-modal Interaction from a Top-down Perspective for Referring Video Object Segmentation ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
ActBERT: Learning Global-Local Video-Text Representations ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Speech-to-Singing Conversion in an Encoder-Decoder Framework ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Symbiotic Attention with Privileged Information for Egocentric Action Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Grounded and Controllable Image Completion by Incorporating Lexical Semantics ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Baidu-UTS Submission to the EPIC-Kitchens Action Recognition Challenge 2019 ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Overcoming Language Variation in Sentiment Analysis with Social Attention ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|