1 |
Multi-scale temporal network for continuous sign language recognition ...
|
|
|
|
Abstract:
Continuous Sign Language Recognition (CSLR) is a challenging research task due to the lack of accurate annotation on the temporal sequence of sign language data. The recent popular usage is a hybrid model based on "CNN + RNN" for CSLR. However, when extracting temporal features in these works, most of the methods using a fixed temporal receptive field and cannot extract the temporal features well for each sign language word. In order to obtain more accurate temporal features, this paper proposes a multi-scale temporal network (MSTNet). The network mainly consists of three parts. The Resnet and two fully connected (FC) layers constitute the frame-wise feature extraction part. The time-wise feature extraction part performs temporal feature learning by first extracting temporal receptive field features of different scales using the proposed multi-scale temporal block (MST-block) to improve the temporal modeling capability, and then further encoding the temporal features of different scales by the transformers ... : 22 pages, 7 figures ...
|
|
Keyword:
Computer Vision and Pattern Recognition cs.CV; FOS Computer and information sciences
|
|
URL: https://arxiv.org/abs/2204.03864 https://dx.doi.org/10.48550/arxiv.2204.03864
|
|
BASE
|
|
Hide details
|
|
|
|