移动更好：自我提出的具体物体检测

论文标题

移动更好：自我提出的具体物体检测

Move to See Better: Self-Improving Embodied Object Detection

论文作者

Fang, Zhaoyuan, Jain, Ayush, Sarch, Gabriel, Harley, Adam W., Fragkiadaki, Katerina

论文摘要

对象检测和分割的被动方法将同一场景的图像视为单个样本，并且不利用多个视图的对象持久性。因此，对新颖或困难的观点的概括需要大量注释的额外培训。相比之下，人类通常通过简单地四处走动，以获取更有信息的观点来识别对象。在本文中，我们提出了一种改善测试环境中对象检测的方法，假设除了具有预训练的2D对象检测器的体现代理外，什么都没有。我们的代理收集多视图数据，生成2D和3D伪标签，并以自我监督的方式进行微型检测器进行微型检测器。室内和室外数据集的实验表明，（1）我们的方法从多视图RGB-D数据获得了高质量的2D和3D伪标记；（2）对这些伪标签进行微调可在测试环境中显着改善2D检测器；（3）用我们的伪标签训练3D检测器的表现优于先前的自我监督方法。（4）在较弱的监督下，我们的方法可以为新物体生成更好的伪标记。

Passive methods for object detection and segmentation treat images of the same scene as individual samples and do not exploit object permanence across multiple views. Generalization to novel or difficult viewpoints thus requires additional training with lots of annotations. In contrast, humans often recognize objects by simply moving around, to get more informative viewpoints. In this paper, we propose a method for improving object detection in testing environments, assuming nothing but an embodied agent with a pre-trained 2D object detector. Our agent collects multi-view data, generates 2D and 3D pseudo-labels, and fine-tunes its detector in a self-supervised manner. Experiments on both indoor and outdoor datasets show that (1) our method obtains high-quality 2D and 3D pseudo-labels from multi-view RGB-D data; (2) fine-tuning with these pseudo-labels improves the 2D detector significantly in the test environment; (3) training a 3D detector with our pseudo-labels outperforms a prior self-supervised method by a large margin; (4) given weak supervision, our method can generate better pseudo-labels for novel objects.

下载PDF全文

下载文献需遵守相关版权规定

论文标题