视频伪装对象检测的隐式运动处理

论文标题

视频伪装对象检测的隐式运动处理

Implicit Motion Handling for Video Camouflaged Object Detection

论文作者

Cheng, Xuelian, Xiong, Huan, Fan, Deng-Ping, Zhong, Yiran, Harandi, Mehrtash, Drummond, Tom, Ge, Zongyuan

论文摘要

我们提出了一个新的视频伪装对象检测（VCOD）框架，该框架可以利用短期动态和长期时间一致性，以从视频帧中检测伪装的对象。伪装物体的重要属性是它们通常表现出与背景相似的模式，因此使它们很难从静止图像中识别。因此，有效地处理视频中的时间动态成为VCOD任务的关键，因为伪装的对象在移动时会很明显。但是，当前的VCOD方法通常利用同构图或光流代表运动，其中检测误差可能会从运动估计误差和分割误差中累积。另一方面，我们的方法在单个优化框架内统一运动估计和对象分割。具体而言，我们建立一个密集的相关量，以隐式捕获相邻框架之间的动作，并利用最终的分割监督来优化隐式运动估计和分段。此外，为了在视频序列中执行时间一致性，我们共同利用时空变压器来完善短期预测。对VCOD基准测试的广泛实验证明了我们方法的建筑有效性。我们还提供了一个名为MOCA面罩的大规模VCOD数据集，该数据集使用像素级手工制作的地面掩码，并构建了一种综合的VCOD基准，并具有以前的方法，可促进该方向的研究。数据集链接：https：//xueliancheng.github.io/slt-net project。

We propose a new video camouflaged object detection (VCOD) framework that can exploit both short-term dynamics and long-term temporal consistency to detect camouflaged objects from video frames. An essential property of camouflaged objects is that they usually exhibit patterns similar to the background and thus make them hard to identify from still images. Therefore, effectively handling temporal dynamics in videos becomes the key for the VCOD task as the camouflaged objects will be noticeable when they move. However, current VCOD methods often leverage homography or optical flows to represent motions, where the detection error may accumulate from both the motion estimation error and the segmentation error. On the other hand, our method unifies motion estimation and object segmentation within a single optimization framework. Specifically, we build a dense correlation volume to implicitly capture motions between neighbouring frames and utilize the final segmentation supervision to optimize the implicit motion estimation and segmentation jointly. Furthermore, to enforce temporal consistency within a video sequence, we jointly utilize a spatio-temporal transformer to refine the short-term predictions. Extensive experiments on VCOD benchmarks demonstrate the architectural effectiveness of our approach. We also provide a large-scale VCOD dataset named MoCA-Mask with pixel-level handcrafted ground-truth masks and construct a comprehensive VCOD benchmark with previous methods to facilitate research in this direction. Dataset Link: https://xueliancheng.github.io/SLT-Net-project.

下载PDF全文

下载文献需遵守相关版权规定

论文标题