41 |
The use of artificial intelligence and robotics in regional anaesthesia
|
|
|
|
BASE
|
|
Show details
|
|
42 |
A Novel Speech to Mouth Articulation System for Realistic Humanoid Robots
|
|
|
|
BASE
|
|
Show details
|
|
43 |
Cetacean Translation Initiative: a roadmap to deciphering the communication of sperm whales ...
|
|
|
|
BASE
|
|
Show details
|
|
44 |
LanguageRefer: Spatial-Language Model for 3D Visual Grounding ...
|
|
|
|
Abstract:
For robots to understand human instructions and perform meaningful tasks in the near future, it is important to develop learned models that comprehend referential language to identify common objects in real-world 3D scenes. In this paper, we introduce a spatial-language model for a 3D visual grounding problem. Specifically, given a reconstructed 3D scene in the form of point clouds with 3D bounding boxes of potential object candidates, and a language utterance referring to a target object in the scene, our model successfully identifies the target object from a set of potential candidates. Specifically, LanguageRefer uses a transformer-based architecture that combines spatial embedding from bounding boxes with fine-tuned language embeddings from DistilBert to predict the target object. We show that it performs competitively on visio-linguistic datasets proposed by ReferIt3D. Further, we analyze its spatial reasoning task performance decoupled from perception noise, the accuracy of view-dependent utterances, ... : 11 pages, 3 figures ...
|
|
Keyword:
Computation and Language cs.CL; Computer Vision and Pattern Recognition cs.CV; FOS Computer and information sciences; Robotics cs.RO
|
|
URL: https://dx.doi.org/10.48550/arxiv.2107.03438 https://arxiv.org/abs/2107.03438
|
|
BASE
|
|
Hide details
|
|
45 |
VISITRON: Visual Semantics-Aligned Interactively Trained Object-Navigator ...
|
|
|
|
BASE
|
|
Show details
|
|
46 |
CrossMap Transformer: A Crossmodal Masked Path Transformer Using Double Back-Translation for Vision-and-Language Navigation ...
|
|
|
|
BASE
|
|
Show details
|
|
47 |
Exploiting Natural Language for Efficient Risk-Aware Multi-robot SaR Planning ...
|
|
|
|
BASE
|
|
Show details
|
|
48 |
Embodying Pre-Trained Word Embeddings Through Robot Actions ...
|
|
|
|
BASE
|
|
Show details
|
|
49 |
Kinematic Motion Retargeting via Neural Latent Optimization for Learning Sign Language ...
|
|
|
|
BASE
|
|
Show details
|
|
50 |
Neural Variational Learning for Grounded Language Acquisition ...
|
|
|
|
BASE
|
|
Show details
|
|
55 |
The effectiveness of face-name mnemonics on name recall ...
|
|
|
|
BASE
|
|
Show details
|
|
56 |
The effectiveness of face-name mnemonics on name recall ...
|
|
|
|
BASE
|
|
Show details
|
|
57 |
First Steps Toward a Swarm Robotics Model of Self-Domestication and Language Evolution ...
|
|
|
|
BASE
|
|
Show details
|
|
58 |
Using narrative to manipulate perceived mind and word order during language production ...
|
|
|
|
BASE
|
|
Show details
|
|
59 |
Does encouraging gesture use help us connect remote associations?: The role of mental imagery ...
|
|
|
|
BASE
|
|
Show details
|
|
60 |
Does encouraging gesture use help us connect remote associations?: The role of mental imagery ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|