Compfeat：视频实例细分的综合功能聚合

论文标题

Compfeat：视频实例细分的综合功能聚合

CompFeat: Comprehensive Feature Aggregation for Video Instance Segmentation

论文作者

Fu, Yang, Yang, Linjie, Liu, Ding, Huang, Thomas S., Shi, Humphrey

论文摘要

视频实例分割是一项复杂的任务，我们需要在其中检测，细分和跟踪任何给定视频的每个对象。以前的方法仅利用单帧功能来检测，细分和跟踪对象，并且由于几个截然不同的挑战，例如运动模糊和剧烈的外观变化，因此在视频场景中受苦。为了消除仅使用单帧功能引入的歧义，我们提出了一种新颖的综合特征聚合方法（COMPFEAT），以在框架级别和对象级别使用时间和空间上下文信息来完善特征。汇总过程经过精心设计，具有一种新的注意力机制，可显着提高学习特征的判别能力。我们通过结合特征相似性和空间相似性，通过暹罗设计进一步提高了模型的跟踪能力。在YouTube-VIS数据集上进行的实验验证了所提出的Compfeat的有效性。我们的代码将在https://github.com/shi-labs/compfeat-for-video-instance-mentegation上找到。

Video instance segmentation is a complex task in which we need to detect, segment, and track each object for any given video. Previous approaches only utilize single-frame features for the detection, segmentation, and tracking of objects and they suffer in the video scenario due to several distinct challenges such as motion blur and drastic appearance change. To eliminate ambiguities introduced by only using single-frame features, we propose a novel comprehensive feature aggregation approach (CompFeat) to refine features at both frame-level and object-level with temporal and spatial context information. The aggregation process is carefully designed with a new attention mechanism which significantly increases the discriminative power of the learned features. We further improve the tracking capability of our model through a siamese design by incorporating both feature similarities and spatial similarities. Experiments conducted on the YouTube-VIS dataset validate the effectiveness of proposed CompFeat. Our code will be available at https://github.com/SHI-Labs/CompFeat-for-Video-Instance-Segmentation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题