基于自上而下的机器人姿势预测的自上而下信息和自下而上信息之间的相互作用的引导视觉注意力模型

论文标题

基于自上而下的机器人姿势预测的自上而下信息和自下而上信息之间的相互作用的引导视觉注意力模型

Guided Visual Attention Model Based on Interactions Between Top-down and Bottom-up Information for Robot Pose Prediction

论文作者

Hiruma, Hyogo, Mori, Hiroki, Ito, Hiroshi, Ogata, Tetsuya

论文摘要

深度机器人视觉模型被广泛用于识别相机图像的对象，但是在检测未经训练位置的对象时表现出差的性能。尽管可以通过大型数据集培训来缓解此类问题，但数据集收集成本却不能忽略。现有的视觉注意力模型通过采用有效的数据结构来解决该问题，该结构学会了提取任务相关的图像区域。但是，由于模型无法在训练后修改注意力目标，因此很难应用于动态变化的任务。本文提出了一种新型的键性值配方的视觉注意模型。该模型能够通过外部修改查询表示形式（即自上而下的注意力）来切换注意力目标。提出的模型在模拟器和现实世界环境上进行了实验。将模型与模拟器实验中现有的端到端机器人视觉模型进行了比较，显示了更高的性能和数据效率。在实际的机器人实验中，该模型显示出高精度以及其可伸缩性和扩展性。

Deep robot vision models are widely used for recognizing objects from camera images, but shows poor performance when detecting objects at untrained positions. Although such problem can be alleviated by training with large datasets, the dataset collection cost cannot be ignored. Existing visual attention models tackled the problem by employing a data efficient structure which learns to extract task relevant image areas. However, since the models cannot modify attention targets after training, it is difficult to apply to dynamically changing tasks. This paper proposed a novel Key-Query-Value formulated visual attention model. This model is capable of switching attention targets by externally modifying the Query representations, namely top-down attention. The proposed model is experimented on a simulator and a real-world environment. The model was compared to existing end-to-end robot vision models in the simulator experiments, showing higher performance and data efficiency. In the real-world robot experiments, the model showed high precision along with its scalability and extendibility.

下载PDF全文

下载文献需遵守相关版权规定

论文标题