$ s^3 $ net：带有单眼视频和合成数据的语义意识自我监督的深度估计

论文标题

$ s^3 $ net：带有单眼视频和合成数据的语义意识自我监督的深度估计

$S^3$Net: Semantic-Aware Self-supervised Depth Estimation with Monocular Videos and Synthetic Data

论文作者

Cheng, Bin, Saggu, Inderjot Singh, Shah, Raunak, Bansal, Gaurav, Bharadia, Dinesh

论文摘要

用单眼相机解决深度估计，可以在自主驾驶和机器人技术等应用中广泛使用相机作为低成本深度估计传感器。但是，学习这样的可扩展深度估计模型将需要大量的标记数据，这很昂贵。有两种流行的现有方法不需要注释的深度图：（i）在对抗性框架中使用标记的合成和未标记的真实数据来预测更准确的深度，以及（ii）在单眼视频框架中利用跨空间和时间的无监督模型。理想情况下，我们想利用两种方法相互补充时提供的功能。但是，现有方法不能充分利用这些添加剂。我们提出了$ s^3 $ NET，这是一个组合以下互补特征的自我监管的框架：我们使用合成和现实世界图像进行训练，同时利用几何，时间和语义约束。我们的小说合并体系结构为使用单眼视频提供了一种新的自我监督深度估算的最新最新。我们提出了一种独特的方式来训练这个自我监督的框架，并获得（i）比以前使用域适应性的合成监督方法的$ 15 \％$提高，并且（ii）比以前的自我审议的方法提高了$ 10 \％的改善，从真实数据中利用了几何约束。

Solving depth estimation with monocular cameras enables the possibility of widespread use of cameras as low-cost depth estimation sensors in applications such as autonomous driving and robotics. However, learning such a scalable depth estimation model would require a lot of labeled data which is expensive to collect. There are two popular existing approaches which do not require annotated depth maps: (i) using labeled synthetic and unlabeled real data in an adversarial framework to predict more accurate depth, and (ii) unsupervised models which exploit geometric structure across space and time in monocular video frames. Ideally, we would like to leverage features provided by both approaches as they complement each other; however, existing methods do not adequately exploit these additive benefits. We present $S^3$Net, a self-supervised framework which combines these complementary features: we use synthetic and real-world images for training while exploiting geometric, temporal, as well as semantic constraints. Our novel consolidated architecture provides a new state-of-the-art in self-supervised depth estimation using monocular videos. We present a unique way to train this self-supervised framework, and achieve (i) more than $15\%$ improvement over previous synthetic supervised approaches that use domain adaptation and (ii) more than $10\%$ improvement over previous self-supervised approaches which exploit geometric constraints from the real data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题