nerf置置：一种首次重建 - 对弱监督的6D对象姿势估计的重新构造方法

论文标题

nerf置置：一种首次重建 - 对弱监督的6D对象姿势估计的重新构造方法

NeRF-Pose: A First-Reconstruct-Then-Regress Approach for Weakly-supervised 6D Object Pose Estimation

论文作者

Li, Fu, Yu, Hao, Shugurov, Ivan, Busam, Benjamin, Yang, Shaowu, Ilic, Slobodan

论文摘要

单眼图像中3D对象的姿势估计是计算机视觉中的一个基本问题。现有的6D姿势估计的深度学习方法通常依赖于3D对象模型的可用性和6D姿势注释的假设。但是，实际数据中6D姿势的精确注释很复杂，耗时且不可扩展，而合成数据则很好地扩展了，但缺乏现实主义。为了避免这些问题，我们提出了一条弱监督的基于重建的管道，名为NERF-Pose，它在训练过程中只需要2D对象分割和已知的相对摄像头姿势。遵循首先构建的重构想法，我们首先以隐式神经表示形式从多个视图中重建对象。然后，我们训练一个姿势回归网络，以预测图像和重建模型之间的像素2D-3D对应关系。在推断时，该方法仅需要一个图像作为输入。启用了NERF的PNP+RANSAC算法用于从预测的对应关系中估算稳定且准确的姿势。在linemod和linemod-clusion上进行的实验表明，与最佳的6D姿势估计方法相比，该方法具有最先进的精度，尽管仅接受弱标记训练。此外，我们将使用更多真实的培训图像扩展了自制的DB数据集，以支持弱监督的任务并在此数据集中获得令人信服的结果。扩展数据集和代码将很快发布。

Pose estimation of 3D objects in monocular images is a fundamental and long-standing problem in computer vision. Existing deep learning approaches for 6D pose estimation typically rely on the assumption of availability of 3D object models and 6D pose annotations. However, precise annotation of 6D poses in real data is intricate, time-consuming and not scalable, while synthetic data scales well but lacks realism. To avoid these problems, we present a weakly-supervised reconstruction-based pipeline, named NeRF-Pose, which needs only 2D object segmentation and known relative camera poses during training. Following the first-reconstruct-then-regress idea, we first reconstruct the objects from multiple views in the form of an implicit neural representation. Then, we train a pose regression network to predict pixel-wise 2D-3D correspondences between images and the reconstructed model. At inference, the approach only needs a single image as input. A NeRF-enabled PnP+RANSAC algorithm is used to estimate stable and accurate pose from the predicted correspondences. Experiments on LineMod and LineMod-Occlusion show that the proposed method has state-of-the-art accuracy in comparison to the best 6D pose estimation methods in spite of being trained only with weak labels. Besides, we extend the Homebrewed DB dataset with more real training images to support the weakly supervised task and achieve compelling results on this dataset. The extended dataset and code will be released soon.

下载PDF全文

下载文献需遵守相关版权规定

论文标题