通过自空间标签扩散的无监督的雾化场景理解

论文标题

通过自空间标签扩散的无监督的雾化场景理解

Unsupervised Foggy Scene Understanding via Self Spatial-Temporal Label Diffusion

论文作者

Liao, Liang, Chen, Wenyi, Xiao, Jing, Wang, Zheng, Lin, Chia-Wen, Satoh, Shin'ichi

论文摘要

在驾驶场景中了解有雾的图像序列对于自动驾驶至关重要，但是由于难以收集和注释不利天气的现实世界图像，这仍然是一项具有挑战性的任务。最近，自我训练策略被认为是无监督域适应性的有力解决方案，通过生成目标伪标签并重新训练模型，它迭代地将模型从源域转化为目标域。但是，选择自信的伪标签不可避免地会遭受稀疏与准确性之间的冲突，这两者都会导致次优模型。为了解决这个问题，我们利用了驾驶场景的雾图图像序列的特征，以使自信的伪标签致密。具体而言，基于顺序图像数据的局部空间相似性和相邻时间对应关系的两个发现，我们提出了一种新型的目标域驱动的伪标签扩散（TDO-DIF）方案。它采用超类和光流来分别识别空间相似性和时间对应关系，然后扩散超级像素的自信但稀疏的伪标签或通过流量链接的时间对应对。此外，为了确保扩散像素的特征相似性，我们在模型重新训练阶段引入了局部空间相似性损失和时间对比度损失。实验结果表明，我们的TDO-DIF方案有助于自适应模型实现51.92％和53.84％的平均相交跨工会（MIOU）（MIOU），这是两个公共可用的天然雾化数据集（超过雾化的Zurich and Foggy驾驶），超过了目前的未经治疗的无效域自适应域自适应的适应性的非适应性半分组方法。可以在https://github.com/velor2012/tdo-dif上找到模型和数据。

Understanding foggy image sequence in the driving scenes is critical for autonomous driving, but it remains a challenging task due to the difficulty in collecting and annotating real-world images of adverse weather. Recently, the self-training strategy has been considered a powerful solution for unsupervised domain adaptation, which iteratively adapts the model from the source domain to the target domain by generating target pseudo labels and re-training the model. However, the selection of confident pseudo labels inevitably suffers from the conflict between sparsity and accuracy, both of which will lead to suboptimal models. To tackle this problem, we exploit the characteristics of the foggy image sequence of driving scenes to densify the confident pseudo labels. Specifically, based on the two discoveries of local spatial similarity and adjacent temporal correspondence of the sequential image data, we propose a novel Target-Domain driven pseudo label Diffusion (TDo-Dif) scheme. It employs superpixels and optical flows to identify the spatial similarity and temporal correspondence, respectively and then diffuses the confident but sparse pseudo labels within a superpixel or a temporal corresponding pair linked by the flow. Moreover, to ensure the feature similarity of the diffused pixels, we introduce local spatial similarity loss and temporal contrastive loss in the model re-training stage. Experimental results show that our TDo-Dif scheme helps the adaptive model achieve 51.92% and 53.84% mean intersection-over-union (mIoU) on two publicly available natural foggy datasets (Foggy Zurich and Foggy Driving), which exceeds the state-of-the-art unsupervised domain adaptive semantic segmentation methods. Models and data can be found at https://github.com/velor2012/TDo-Dif.

下载PDF全文

下载文献需遵守相关版权规定

论文标题