Page: 1 2 3 4 5 6 7 8... 870
61 |
Word separation in continuous sign language using isolated signs and post-processing ...
|
|
|
|
BASE
|
|
Show details
|
|
62 |
Exploring Sub-skeleton Trajectories for Interpretable Recognition of Sign Language ...
|
|
|
|
BASE
|
|
Show details
|
|
63 |
ASL-Skeleton3D and ASL-Phono: Two Novel Datasets for the American Sign Language ...
|
|
|
|
BASE
|
|
Show details
|
|
64 |
TFS Recognition: Investigating MPH]{Thai Finger Spelling Recognition: Investigating MediaPipe Hands Potentials ...
|
|
|
|
BASE
|
|
Show details
|
|
65 |
Sign Language Video Retrieval with Free-Form Textual Queries ...
|
|
|
|
BASE
|
|
Show details
|
|
68 |
Sign Language Recognition System using TensorFlow Object Detection API ...
|
|
|
|
BASE
|
|
Show details
|
|
69 |
Τρισδιάστατη ανακατασκευή ανθρωπίνου σώματος, χεριών και προσώπου με εφαρμογές στην αναγνώριση νοηματικής γλώσσας ...
|
|
|
|
BASE
|
|
Show details
|
|
70 |
Biasing Like Human: A Cognitive Bias Framework for Scene Graph Generation ...
|
|
|
|
BASE
|
|
Show details
|
|
71 |
hate-alert@DravidianLangTech-ACL2022: Ensembling Multi-Modalities for Tamil TrollMeme Classification ...
|
|
|
|
BASE
|
|
Show details
|
|
72 |
Wukong: 100 Million Large-scale Chinese Cross-modal Pre-training Dataset and A Foundation Framework ...
|
|
Gu, Jiaxi; Meng, Xiaojun; Lu, Guansong; Hou, Lu; Niu, Minzhe; Liang, Xiaodan; Yao, Lewei; Huang, Runhui; Zhang, Wei; Jiang, Xin; Xu, Chunjing; Xu, Hang. - : arXiv, 2022
|
|
Abstract:
Vision-Language Pre-training (VLP) models have shown remarkable performance on various downstream tasks. Their success heavily relies on the scale of pre-trained cross-modal datasets. However, the lack of large-scale datasets and benchmarks in Chinese hinders the development of Chinese VLP models and broader multilingual applications. In this work, we release a large-scale Chinese cross-modal dataset named Wukong, containing 100 million Chinese image-text pairs from the web. Wukong aims to benchmark different multi-modal pre-training methods to facilitate the VLP research and community development. Furthermore, we release a group of models pre-trained with various image encoders (ViT-B/ViT-L/SwinT) and also apply advanced pre-training techniques into VLP such as locked-image text tuning, token-wise similarity in contrastive learning, and reduced-token interaction. Extensive experiments and a deep benchmarking of different downstream tasks are also provided. Experiments show that Wukong can serve as a ...
|
|
Keyword:
Computer Vision and Pattern Recognition cs.CV; FOS Computer and information sciences; Machine Learning cs.LG
|
|
URL: https://arxiv.org/abs/2202.06767 https://dx.doi.org/10.48550/arxiv.2202.06767
|
|
BASE
|
|
Hide details
|
|
73 |
SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
74 |
3MASSIV: Multilingual, Multimodal and Multi-Aspect dataset of Social Media Short Videos ...
|
|
|
|
BASE
|
|
Show details
|
|
76 |
EnvEdit: Environment Editing for Vision-and-Language Navigation ...
|
|
|
|
BASE
|
|
Show details
|
|
77 |
IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages ...
|
|
|
|
BASE
|
|
Show details
|
|
80 |
IterVM: Iterative Vision Modeling Module for Scene Text Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
Page: 1 2 3 4 5 6 7 8... 870
|
|