基于变压器的机器人Grasp检测

论文标题

基于变压器的机器人Grasp检测

Robotic grasp detection based on Transformer

论文作者

Dong, Mingshuai, Yu, Xiuli

论文摘要

在混乱的环境中抓住检测仍然是机器人的巨大挑战。当前，变压器机制已成功地应用于视觉任务，其出色的全球上下文信息提取能力提供了一种可行的方法，可以改善杂乱无章的场景中机器人抓取检测的性能。但是，原始变压器模型的电感偏置能力不足需要大规模的数据集训练，这很难获得GRASP检测。在本文中，我们提出了一个基于编码器解码器结构的GRASP检测模型。编码器使用变压器网络提取全局上下文信息。解码器使用完全卷积的神经网络来提高模型的电感偏置能力，并结合了编码器提取的特征以预测最终的掌握配置。 VMRD数据集上的实验表明，我们的模型在重叠对象场景中的性能要好得多。同时，在康奈尔（Cornell）掌握数据集上，我们的方法的准确性为98.1％，这与最先进的算法相当。

Grasp detection in a cluttered environment is still a great challenge for robots. Currently, the Transformer mechanism has been successfully applied to visual tasks, and its excellent ability of global context information extraction provides a feasible way to improve the performance of robotic grasp detection in cluttered scenes. However, the insufficient inductive bias ability of the original Transformer model requires large-scale datasets training, which is difficult to obtain for grasp detection. In this paper, we propose a grasp detection model based on encoder-decoder structure. The encoder uses a Transformer network to extract global context information. The decoder uses a fully convolutional neural network to improve the inductive bias capability of the model and combine features extracted by the encoder to predict the final grasp configuration. Experiments on the VMRD dataset demonstrate that our model performs much better in overlapping object scenes. Meanwhile, on the Cornell Grasp dataset, our approach achieves an accuracy of 98.1%, which is comparable with state-of-the-art algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题