多视图对象姿势姿势估计来自对应分布和表现几何形状

论文标题

多视图对象姿势姿势估计来自对应分布和表现几何形状

Multi-view object pose estimation from correspondence distributions and epipolar geometry

论文作者

Haugaard, Rasmus Laurvig, Iversen, Thorbjørn Mosekjær

论文摘要

在许多涉及操纵刚性对象的自动化任务中，必须获取对象的姿势。使用单个RGB或RGB-D传感器的基于视觉的姿势估计由于其广泛的适用性而特别受欢迎。 However, single-view pose estimation is inherently limited by depth ambiguity and ambiguities imposed by various phenomena like occlusion, self-occlusion, reflections, etc. Aggregation of information from multiple views can potentially resolve these ambiguities, but the current state-of-the-art multi-view pose estimation method only uses multiple views to aggregate single-view pose estimates, and thus rely on obtaining good single-view estimates.我们提出了一种多视图姿势估计方法，该方法汇总了从多个视图中学习的2d-3d分布，以进行初始估计和可选的改进。我们的方法使用学到的2D-3D对应分布对3D-3D对应关系进行3D-3D对应关系的概率采样，该分布被隐式训练以尊重诸如对称性之类的视觉歧义。与最佳单视图方法相比，对无T-less数据集的评估表明，我们的方法将姿势估计误差降低了80-91％，并且即使与使用五个和八个视图的方法相比，我们在T-less上呈现最新结果。

In many automation tasks involving manipulation of rigid objects, the poses of the objects must be acquired. Vision-based pose estimation using a single RGB or RGB-D sensor is especially popular due to its broad applicability. However, single-view pose estimation is inherently limited by depth ambiguity and ambiguities imposed by various phenomena like occlusion, self-occlusion, reflections, etc. Aggregation of information from multiple views can potentially resolve these ambiguities, but the current state-of-the-art multi-view pose estimation method only uses multiple views to aggregate single-view pose estimates, and thus rely on obtaining good single-view estimates. We present a multi-view pose estimation method which aggregates learned 2D-3D distributions from multiple views for both the initial estimate and optional refinement. Our method performs probabilistic sampling of 3D-3D correspondences under epipolar constraints using learned 2D-3D correspondence distributions which are implicitly trained to respect visual ambiguities such as symmetry. Evaluation on the T-LESS dataset shows that our method reduces pose estimation errors by 80-91% compared to the best single-view method, and we present state-of-the-art results on T-LESS with four views, even compared with methods using five and eight views.

下载PDF全文

下载文献需遵守相关版权规定

论文标题