端到端视频实例通过时空图神经网络进行分割

论文标题

端到端视频实例通过时空图神经网络进行分割

End-to-end video instance segmentation via spatial-temporal graph neural networks

论文作者

Wang, Tao, Xu, Ning, Chen, Kean, Lin, Weiyao

论文摘要

视频实例分割是一项具有挑战性的任务，将图像实例分割扩展到视频域。现有方法要么仅依靠单帧信息进行检测和分割子问题，要么将跟踪作为一个单独的后处理步骤，这限制了其充分利用和共享所有子问题的有用的时空信息的能力。在本文中，我们提出了一种新型的基于图形神经网络（GNN）的方法来处理上述限制。具体而言，代表实例特征的图节点用于检测和分割，而代表实例关系的图表则用于跟踪。通过图形更新和所有子问题（即检测，分割和跟踪）有效地传播和共享框内信息和框内信息，并在统一的框架中共同优化。与现有方法相比，我们方法的性能在YouTubevis验证数据集上显示出很大的改进，并且使用Resnet-50骨干线达到35.2％的AP，以22 fps运行。代码可在http://github.com/lucaswithai/visgraph.git上找到。

Video instance segmentation is a challenging task that extends image instance segmentation to the video domain. Existing methods either rely only on single-frame information for the detection and segmentation subproblems or handle tracking as a separate post-processing step, which limit their capability to fully leverage and share useful spatial-temporal information for all the subproblems. In this paper, we propose a novel graph-neural-network (GNN) based method to handle the aforementioned limitation. Specifically, graph nodes representing instance features are used for detection and segmentation while graph edges representing instance relations are used for tracking. Both inter and intra-frame information is effectively propagated and shared via graph updates and all the subproblems (i.e. detection, segmentation and tracking) are jointly optimized in an unified framework. The performance of our method shows great improvement on the YoutubeVIS validation dataset compared to existing methods and achieves 35.2% AP with a ResNet-50 backbone, operating at 22 FPS. Code is available at http://github.com/lucaswithai/visgraph.git .

下载PDF全文

下载文献需遵守相关版权规定

论文标题