短语：野外基于语言的图像分割

论文标题

短语：野外基于语言的图像分割

PhraseCut: Language-based Image Segmentation in the Wild

论文作者

Wu, Chenyun, Lin, Zhe, Cohen, Scott, Bui, Trung, Maji, Subhransu

论文摘要

我们考虑了给定自然语言短语分割图像区域的问题，并在77,262张图像和345,486个短语区域对的新型数据集上进行研究。我们的数据集收集在视觉基因组数据集的顶部，并使用现有注释来生成一组具有挑战性的参考短语，并为其手动注释相应的区域。数据集中的短语对应于多个区域，并描述大量对象和内容类别及其属性，例如颜色，形状，零件和与图像中其他实体的关系。我们的实验表明，数据集中概念的规模和多样性对现有的最新面临提出了重大挑战。我们系统地处理这些概念的长尾性质，并提出了一种模块化方法，以结合胜过现有方法的类别，属性和关系线索。

We consider the problem of segmenting image regions given a natural language phrase, and study it on a novel dataset of 77,262 images and 345,486 phrase-region pairs. Our dataset is collected on top of the Visual Genome dataset and uses the existing annotations to generate a challenging set of referring phrases for which the corresponding regions are manually annotated. Phrases in our dataset correspond to multiple regions and describe a large number of object and stuff categories as well as their attributes such as color, shape, parts, and relationships with other entities in the image. Our experiments show that the scale and diversity of concepts in our dataset poses significant challenges to the existing state-of-the-art. We systematically handle the long-tail nature of these concepts and present a modular approach to combine category, attribute, and relationship cues that outperforms existing approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题