SWEM：迈向实时视频对象分割，并具有顺序的加权期望最大化

论文标题

SWEM：迈向实时视频对象分割，并具有顺序的加权期望最大化

SWEM: Towards Real-Time Video Object Segmentation with Sequential Weighted Expectation-Maximization

论文作者

Lin, Zhihui, Yang, Tianyu, Li, Maomao, Wang, Ziyu, Yuan, Chun, Jiang, Wenhao, Liu, Wei

论文摘要

基于匹配的方法，尤其是基于时空记忆的方法，在半监督视频对象分割（VOS）中显着领先于其他解决方案。但是，不断增长和冗余的模板特征导致推断效率低下。为了减轻这一点，我们提出了一个新型的顺序加权期望最大化（SWEM）网络，以大大降低记忆特征的冗余。与以前仅检测帧之间特征冗余的方法不同，Swem通过利用顺序加权EM算法来合并框架内和框架间的相似特征。此外，框架特征的自适应权重具有代表硬样品的灵活性，从而改善了模板的歧视。此外，所提出的方法在内存中保留了固定数量的模板特征，从而确保了VOS系统的稳定推理复杂性。对常用的戴维斯和YouTube-VOS数据集进行了广泛的实验，验证了SWEM的高效率（36 fps）和高性能（84.3 \％$ \ Mathcal {J} \＆\ Mathcal {f} $代码可在以下网址找到：https：//github.com/lmm077/swem。

Matching-based methods, especially those based on space-time memory, are significantly ahead of other solutions in semi-supervised video object segmentation (VOS). However, continuously growing and redundant template features lead to an inefficient inference. To alleviate this, we propose a novel Sequential Weighted Expectation-Maximization (SWEM) network to greatly reduce the redundancy of memory features. Different from the previous methods which only detect feature redundancy between frames, SWEM merges both intra-frame and inter-frame similar features by leveraging the sequential weighted EM algorithm. Further, adaptive weights for frame features endow SWEM with the flexibility to represent hard samples, improving the discrimination of templates. Besides, the proposed method maintains a fixed number of template features in memory, which ensures the stable inference complexity of the VOS system. Extensive experiments on commonly used DAVIS and YouTube-VOS datasets verify the high efficiency (36 FPS) and high performance (84.3\% $\mathcal{J}\&\mathcal{F}$ on DAVIS 2017 validation dataset) of SWEM. Code is available at: https://github.com/lmm077/SWEM.

下载PDF全文

下载文献需遵守相关版权规定

论文标题