通过要求视觉识别

论文标题

通过要求视觉识别

Visual Recognition by Request

论文作者

Tang, Chufeng, Xie, Lingxi, Zhang, Xiaopeng, Hu, Xiaolin, Tian, Qi

论文摘要

人类具有在无限粒度中识别视觉语义的能力，但是现有的视觉识别算法无法实现这一目标。在本文中，我们通过请求（VirReq）建立了一个名为“视觉识别”的新范式，以弥合差距。关键在于将视觉识别分解为名称请求的原子任务，并利用知识库（一个基于层次和文本的字典）来协助任务定义。 VirReq允许（i）从高度不完整的注释中学习复杂的整个部分层次结构，以及（ii）以最少的努力插入新概念。我们还通过将语言驱动的识别集成到最近的语义和实例分割方法中，并证明其在CPP和ADE20K上的灵活识别能力，这是两个带有分层整体零件注释的数据集，并证明了其灵活的识别能力。

Humans have the ability of recognizing visual semantics in an unlimited granularity, but existing visual recognition algorithms cannot achieve this goal. In this paper, we establish a new paradigm named visual recognition by request (ViRReq) to bridge the gap. The key lies in decomposing visual recognition into atomic tasks named requests and leveraging a knowledge base, a hierarchical and text-based dictionary, to assist task definition. ViRReq allows for (i) learning complicated whole-part hierarchies from highly incomplete annotations and (ii) inserting new concepts with minimal efforts. We also establish a solid baseline by integrating language-driven recognition into recent semantic and instance segmentation methods, and demonstrate its flexible recognition ability on CPP and ADE20K, two datasets with hierarchical whole-part annotations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题