Yolopose：基于变压器的多对象6D姿势估计使用KePoint回归

论文标题

Yolopose：基于变压器的多对象6D姿势估计使用KePoint回归

YOLOPose: Transformer-based Multi-Object 6D Pose Estimation using Keypoint Regression

论文作者

Amini, Arash, Periyasamy, Arul Selvam, Behnke, Sven

论文摘要

6D对象姿势估计是自动机器人操纵应用的关键先决条件。姿势估计的最新模型是基于卷积神经网络（CNN）。最近，变形金刚是最初用于自然语言处理的建筑，正在实现许多计算机视觉任务的最新结果。具有多头自发机制的配备，变形金刚可实现简单的单阶段端到端体系结构进行学习对象检测，并共同估算6D对象姿势。在这项工作中，我们提出了Yolopose（您的简短形式仅一旦姿势估计来估算），这是一种基于关键点回归的基于变压器的多对象6D姿势估计方法。与预测图像中关键点的标准热图相反，我们直接回归关键点。此外，我们采用可学习的方向估计模块来预测关键点的方向。除了单独的翻译估计模块外，我们的模型是端到端可区分的。我们的方法适用于实时应用，并实现与最新方法相当的结果。

6D object pose estimation is a crucial prerequisite for autonomous robot manipulation applications. The state-of-the-art models for pose estimation are convolutional neural network (CNN)-based. Lately, Transformers, an architecture originally proposed for natural language processing, is achieving state-of-the-art results in many computer vision tasks as well. Equipped with the multi-head self-attention mechanism, Transformers enable simple single-stage end-to-end architectures for learning object detection and 6D object pose estimation jointly. In this work, we propose YOLOPose (short form for You Only Look Once Pose estimation), a Transformer-based multi-object 6D pose estimation method based on keypoint regression. In contrast to the standard heatmaps for predicting keypoints in an image, we directly regress the keypoints. Additionally, we employ a learnable orientation estimation module to predict the orientation from the keypoints. Along with a separate translation estimation module, our model is end-to-end differentiable. Our method is suitable for real-time applications and achieves results comparable to state-of-the-art methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题