DFV：深水引导场景基于不可知的图像的视觉伺服宣传

论文标题

DFV：深水引导场景基于不可知的图像的视觉伺服宣传

DFVS: Deep Flow Guided Scene Agnostic Image Based Visual Servoing

论文作者

Harish, Y V S, Pandya, Harit, Gaud, Ayush, Terupally, Shreya, Shankar, Sai, Krishna, K. Madhava

论文摘要

现有的基于深度学习的视觉致毒方法会回归一对图像之间的相对相机姿势。因此，他们需要大量的培训数据，有时需要微调才能适应新的场景。此外，当前的方法不考虑场景的基本几何形状，而依靠摄像头姿势的直接估计。因此，预测相机姿势的不准确性，尤其是对于遥远的目标，导致了宣誓的性能下降。在本文中，我们提出了一个两倍的解决方案：（i）我们将光流视为我们的视觉特征，可以使用深层神经网络进行预测。（ii）然后将这些流量特征与使用交互矩阵提供的另一个神经网络提供的深度估计系统集成在一起。我们进一步介绍了在各种场景的光真逼真的3D模拟中进行广泛的基准测试，以研究视觉伺服方法的收敛和概括。我们显示了超过3M和40度的收敛性，同时在我们具有挑战性的基准上保持了2厘米以下和1度的精确定位，在这些基准中，现有方法无法在大多数情况下以150万度和20度的速度收敛。此外，我们还评估了我们在空中机器人上的真实情况的方法。我们的方法概括为新的场景，为6个自由度定位任务的精确而稳健的伺服性能带来了甚至大型相机转换，而无需进行任何重新调整或微调。

Existing deep learning based visual servoing approaches regress the relative camera pose between a pair of images. Therefore, they require a huge amount of training data and sometimes fine-tuning for adaptation to a novel scene. Furthermore, current approaches do not consider underlying geometry of the scene and rely on direct estimation of camera pose. Thus, inaccuracies in prediction of the camera pose, especially for distant goals, lead to a degradation in the servoing performance. In this paper, we propose a two-fold solution: (i) We consider optical flow as our visual features, which are predicted using a deep neural network. (ii) These flow features are then systematically integrated with depth estimates provided by another neural network using interaction matrix. We further present an extensive benchmark in a photo-realistic 3D simulation across diverse scenes to study the convergence and generalisation of visual servoing approaches. We show convergence for over 3m and 40 degrees while maintaining precise positioning of under 2cm and 1 degree on our challenging benchmark where the existing approaches that are unable to converge for majority of scenarios for over 1.5m and 20 degrees. Furthermore, we also evaluate our approach for a real scenario on an aerial robot. Our approach generalizes to novel scenarios producing precise and robust servoing performance for 6 degrees of freedom positioning tasks with even large camera transformations without any retraining or fine-tuning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题