视频阴影通过时空插值一致性训练通过

论文标题

视频阴影通过时空插值一致性训练通过

Video Shadow Detection via Spatio-Temporal Interpolation Consistency Training

论文作者

Lu, Xiao, Cao, Yihong, Liu, Sheng, Long, Chengjiang, Chen, Zipei, Zhou, Xuanyu, Yang, Yimin, Xiao, Chunxia

论文摘要

注释大规模数据集以进行监督的视频阴影检测方法是一项挑战。直接使用在标记的图像上训练的模型直接导致高概括错误和时间不一致的结果。在本文中，我们通过提出一个时空插值一致性训练（Stict）框架来解决这些挑战，以合理地将未标记的视频框架以及标记的图像以及标记的图像与图像阴影检测网络培训中进行合理化。具体而言，我们提出了空间和时间ICT，其中定义了两个新的插值方案，\ textit {i.e。}，空间插值和时间插值。然后，我们相应地得出了空间和时间插值的一致性约束，以增强像素的分类任务中的概括和分别鼓励时间一致的预测。此外，我们设计了一个量表感知网络，用于图像中的多尺度阴影知识学习，并提出了比例一致性约束，以最大程度地减少不同尺度上预测之间的差异。我们提出的方法在VISHA数据集和自称数据集上得到了广泛的验证。实验结果表明，即使没有视频标签，我们的方法也比大多数最先进的监督，半监督或无监督的图像/视频阴影检测方法以及相关任务中的其他方法要好。代码和数据集可在\ url {https://github.com/yihong-97/stict}上获得。

It is challenging to annotate large-scale datasets for supervised video shadow detection methods. Using a model trained on labeled images to the video frames directly may lead to high generalization error and temporal inconsistent results. In this paper, we address these challenges by proposing a Spatio-Temporal Interpolation Consistency Training (STICT) framework to rationally feed the unlabeled video frames together with the labeled images into an image shadow detection network training. Specifically, we propose the Spatial and Temporal ICT, in which we define two new interpolation schemes, \textit{i.e.}, the spatial interpolation and the temporal interpolation. We then derive the spatial and temporal interpolation consistency constraints accordingly for enhancing generalization in the pixel-wise classification task and for encouraging temporal consistent predictions, respectively. In addition, we design a Scale-Aware Network for multi-scale shadow knowledge learning in images, and propose a scale-consistency constraint to minimize the discrepancy among the predictions at different scales. Our proposed approach is extensively validated on the ViSha dataset and a self-annotated dataset. Experimental results show that, even without video labels, our approach is better than most state of the art supervised, semi-supervised or unsupervised image/video shadow detection methods and other methods in related tasks. Code and dataset are available at \url{https://github.com/yihong-97/STICT}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题