LSMVO：视频对象的长期相似性匹配

论文标题

LSMVO：视频对象的长期相似性匹配

LSMVOS: Long-Short-Term Similarity Matching for Video Object

论文作者

Xuerui, Zhang, Xia, Yuan

论文摘要

客观半监督视频对象分割是指在第一个帧中给定对象标签的后续帧中分割对象。现有的算法主要基于匹配和传播策略的目标，这些算法通常利用带有屏蔽或光流的先前帧。本文探讨了一种新的传播方法，使用短期匹配模块来提取上一个帧的信息并将其应用于传播中，并提出了长期相似性的网络，与视频对象分割（LSMOVS）方法匹配（LSMOVS）方法：通过进行像素级级别的匹配和短期匹配模块之间的匹配和先前的模块之间的相关性，并与第一个匹配模块之间的框架进行匹配，并通过进行匹配的模块，并与第一个匹配的模块进行匹配，并与之相关。以及当前框架的特征模式和上一个帧的掩蔽。两个完善的网络后，通过分割网络获得最终结果。结果：根据戴维斯2016年和2017年两个数据集的实验，本文的方法在没有在线微调的情况下达到了区域相似性和轮廓精度的有利平均值，在单个目标和多个目标方面，该方法可实现86.5％和77.4％。此外，每秒分段帧的计数达到21。结论：本文提出的短期匹配模块比仅仅是蒙版更有利于提取上一帧的信息。通过将长期匹配模块与短期匹配模块相结合，整个网络可以实现有效的视频对象细分，而无需在线微调

Objective Semi-supervised video object segmentation refers to segmenting the object in subsequent frames given the object label in the first frame. Existing algorithms are mostly based on the objectives of matching and propagation strategies, which often make use of the previous frame with masking or optical flow. This paper explores a new propagation method, uses short-term matching modules to extract the information of the previous frame and apply it in propagation, and proposes the network of Long-Short-Term similarity matching for video object segmentation (LSMOVS) Method: By conducting pixel-level matching and correlation between long-term matching module and short-term matching module with the first frame and previous frame, global similarity map and local similarity map are obtained, as well as feature pattern of current frame and masking of previous frame. After two refine networks, final results are obtained through segmentation network. Results: According to the experiments on the two data sets DAVIS 2016 and 2017, the method of this paper achieves favorable average of region similarity and contour accuracy without online fine tuning, which achieves 86.5% and 77.4% in terms of single target and multiple targets. Besides, the count of segmented frames per second reached 21. Conclusion: The short-term matching module proposed in this paper is more conducive to extracting the information of the previous frame than only the mask. By combining the long-term matching module with the short-term matching module, the whole network can achieve efficient video object segmentation without online fine tuning

下载PDF全文

下载文献需遵守相关版权规定

论文标题