Page: 1 2 3 4 5 6 7 8... 870
61 |
Word separation in continuous sign language using isolated signs and post-processing ...
|
|
|
|
BASE
|
|
Show details
|
|
62 |
Exploring Sub-skeleton Trajectories for Interpretable Recognition of Sign Language ...
|
|
|
|
BASE
|
|
Show details
|
|
63 |
ASL-Skeleton3D and ASL-Phono: Two Novel Datasets for the American Sign Language ...
|
|
|
|
BASE
|
|
Show details
|
|
64 |
TFS Recognition: Investigating MPH]{Thai Finger Spelling Recognition: Investigating MediaPipe Hands Potentials ...
|
|
|
|
BASE
|
|
Show details
|
|
65 |
Sign Language Video Retrieval with Free-Form Textual Queries ...
|
|
|
|
BASE
|
|
Show details
|
|
68 |
Sign Language Recognition System using TensorFlow Object Detection API ...
|
|
|
|
BASE
|
|
Show details
|
|
69 |
Τρισδιάστατη ανακατασκευή ανθρωπίνου σώματος, χεριών και προσώπου με εφαρμογές στην αναγνώριση νοηματικής γλώσσας ...
|
|
|
|
BASE
|
|
Show details
|
|
70 |
Biasing Like Human: A Cognitive Bias Framework for Scene Graph Generation ...
|
|
|
|
BASE
|
|
Show details
|
|
71 |
hate-alert@DravidianLangTech-ACL2022: Ensembling Multi-Modalities for Tamil TrollMeme Classification ...
|
|
|
|
BASE
|
|
Show details
|
|
72 |
Wukong: 100 Million Large-scale Chinese Cross-modal Pre-training Dataset and A Foundation Framework ...
|
|
|
|
BASE
|
|
Show details
|
|
73 |
SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
74 |
3MASSIV: Multilingual, Multimodal and Multi-Aspect dataset of Social Media Short Videos ...
|
|
|
|
BASE
|
|
Show details
|
|
75 |
Taking an Emotional Look at Video Paragraph Captioning ...
|
|
|
|
Abstract:
Translating visual data into natural language is essential for machines to understand the world and interact with humans. In this work, a comprehensive study is conducted on video paragraph captioning, with the goal to generate paragraph-level descriptions for a given video. However, current researches mainly focus on detecting objective facts, ignoring the needs to establish the logical associations between sentences and to discover more accurate emotions related to video contents. Such a problem impairs fluent and abundant expressions of predicted captions, which are far below human language tandards. To solve this problem, we propose to construct a large-scale emotion and logic driven multilingual dataset for this task. This dataset is named EMVPC (standing for "Emotional Video Paragraph Captioning") and contains 53 widely-used emotions in daily life, 376 common scenes corresponding to these emotions, 10,291 high-quality videos and 20,582 elaborated paragraph captions with English and Chinese versions. ...
|
|
Keyword:
Computer Vision and Pattern Recognition cs.CV; FOS Computer and information sciences
|
|
URL: https://dx.doi.org/10.48550/arxiv.2203.06356 https://arxiv.org/abs/2203.06356
|
|
BASE
|
|
Hide details
|
|
76 |
EnvEdit: Environment Editing for Vision-and-Language Navigation ...
|
|
|
|
BASE
|
|
Show details
|
|
77 |
IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages ...
|
|
|
|
BASE
|
|
Show details
|
|
80 |
IterVM: Iterative Vision Modeling Module for Scene Text Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
Page: 1 2 3 4 5 6 7 8... 870
|
|