一致的视频实例细分，框架间的重复关注

论文标题

一致的视频实例细分，框架间的重复关注

Consistent Video Instance Segmentation with Inter-Frame Recurrent Attention

论文作者

You, Quanzeng, Wang, Jiang, Chu, Peng, Abrantes, Andre, Liu, Zicheng

论文摘要

视频实例分割旨在预测每个帧的对象分割掩码，并关联多个帧的实例。最近的端到端视频实例分割方法能够以直接的并行序列解码/预测框架一起执行对象分割和实例关联。尽管这些方法通常可以预测较高质量的对象分割掩码，但它们可能无法在具有挑战性的情况下与实例相关联，因为它们没有明确对相邻帧的时间实例一致性进行建模。我们提出了一个一致的端到端视频实例分割框架，并在框架间经常关注对相邻帧和全局时间上下文的时间实例一致性建模。我们的广泛实验表明，框架间的重复注意显着提高了时间实例的一致性，同时保持对象分割掩码的质量。我们的模型在YouTubevis-2019（62.1 \％）和YouTubevis-2021（54.7 \％）数据集上都达到了最先进的精度。此外，定量和定性结果表明，所提出的方法可以预测更具时间一致的实例分割掩码。

Video instance segmentation aims at predicting object segmentation masks for each frame, as well as associating the instances across multiple frames. Recent end-to-end video instance segmentation methods are capable of performing object segmentation and instance association together in a direct parallel sequence decoding/prediction framework. Although these methods generally predict higher quality object segmentation masks, they can fail to associate instances in challenging cases because they do not explicitly model the temporal instance consistency for adjacent frames. We propose a consistent end-to-end video instance segmentation framework with Inter-Frame Recurrent Attention to model both the temporal instance consistency for adjacent frames and the global temporal context. Our extensive experiments demonstrate that the Inter-Frame Recurrent Attention significantly improves temporal instance consistency while maintaining the quality of the object segmentation masks. Our model achieves state-of-the-art accuracy on both YouTubeVIS-2019 (62.1\%) and YouTubeVIS-2021 (54.7\%) datasets. In addition, quantitative and qualitative results show that the proposed methods predict more temporally consistent instance segmentation masks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题