repvos：仔细研究视频对象分割的表达式

论文标题

repvos：仔细研究视频对象分割的表达式

RefVOS: A Closer Look at Referring Expressions for Video Object Segmentation

论文作者

Bellver, Miriam, Ventura, Carles, Silberer, Carina, Kazakos, Ioannis, Torres, Jordi, Giro-i-Nieto, Xavier

论文摘要

通过引用表达式（语言指导的VO）进行视频对象分割的任务是给定语言短语和视频，生成二进制掩码，以介绍该短语所指的对象。我们的工作认为，用于此任务的现有基准主要由琐碎的情况组成，其中可以用简单的短语来识别指称人。我们的分析依赖于戴维斯-2017中短语的新分类，而Actor-Action数据集则依赖于微不足道和非平凡的RES，并带有七个RE语义类别的非平凡RES。我们利用这些数据来分析Refvos的结果，Refvos是一种新型的神经网络，为语言指导的图像分割和语言指导VOS的最新结果的任务获得了竞争结果。我们的研究表明，该任务的主要挑战与理解运动和静态行动有关。

The task of video object segmentation with referring expressions (language-guided VOS) is to, given a linguistic phrase and a video, generate binary masks for the object to which the phrase refers. Our work argues that existing benchmarks used for this task are mainly composed of trivial cases, in which referents can be identified with simple phrases. Our analysis relies on a new categorization of the phrases in the DAVIS-2017 and Actor-Action datasets into trivial and non-trivial REs, with the non-trivial REs annotated with seven RE semantic categories. We leverage this data to analyze the results of RefVOS, a novel neural network that obtains competitive results for the task of language-guided image segmentation and state of the art results for language-guided VOS. Our study indicates that the major challenges for the task are related to understanding motion and static actions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题