论文标题

Yolopose:基于变压器的多对象6D姿势估计使用KePoint回归

YOLOPose: Transformer-based Multi-Object 6D Pose Estimation using Keypoint Regression

论文作者

Amini, Arash, Periyasamy, Arul Selvam, Behnke, Sven

论文摘要

6D对象姿势估计是自动机器人操纵应用的关键先决条件。姿势估计的最新模型是基于卷积神经网络(CNN)。最近,变形金刚是最初用于自然语言处理的建筑,正在实现许多计算机视觉任务的最新结果。具有多头自发机制的配备,变形金刚可实现简单的单阶段端到端体系结构进行学习对象检测,并共同估算6D对象姿势。在这项工作中,我们提出了Yolopose(您的简短形式仅一旦姿势估计来估算),这是一种基于关键点回归的基于变压器的多对象6D姿势估计方法。与预测图像中关键点的标准热图相反,我们直接回归关键点。此外,我们采用可学习的方向估计模块来预测关键点的方向。除了单独的翻译估计模块外,我们的模型是端到端可区分的。我们的方法适用于实时应用,并实现与最新方法相当的结果。

6D object pose estimation is a crucial prerequisite for autonomous robot manipulation applications. The state-of-the-art models for pose estimation are convolutional neural network (CNN)-based. Lately, Transformers, an architecture originally proposed for natural language processing, is achieving state-of-the-art results in many computer vision tasks as well. Equipped with the multi-head self-attention mechanism, Transformers enable simple single-stage end-to-end architectures for learning object detection and 6D object pose estimation jointly. In this work, we propose YOLOPose (short form for You Only Look Once Pose estimation), a Transformer-based multi-object 6D pose estimation method based on keypoint regression. In contrast to the standard heatmaps for predicting keypoints in an image, we directly regress the keypoints. Additionally, we employ a learnable orientation estimation module to predict the orientation from the keypoints. Along with a separate translation estimation module, our model is end-to-end differentiable. Our method is suitable for real-time applications and achieves results comparable to state-of-the-art methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源