通用视觉模型的监督概念扩展

论文标题

通用视觉模型的监督概念扩展

Webly Supervised Concept Expansion for General Purpose Vision Models

论文作者

Kamath, Amita, Clark, Christopher, Gupta, Tanmay, Kolve, Eric, Hoiem, Derek, Kembhavi, Aniruddha

论文摘要

通用视觉（GPV）系统是旨在解决各种视觉任务的模型，而无需进行架构更改。如今，GPV主要从大型完全监督的数据集中学习技能和概念。通过获取数据以迅速学习每个技能的每个概念，将GPV扩展到数万个概念都变得越来越高。这项工作提出了一种有效且廉价的替代方法：从监督数据集中学习技能，从Web图像搜索中学习概念，并利用GPV的关键特征：跨技能传递视觉知识的能力。我们使用跨越10K+视觉概念的1M+图像的数据集，以在3个基准上展示两个现有的GPV（GPV-1和VL-T5）的韦伯监督概念的扩展：5个基于可可的数据集（80个主要概念）（80个主要概念），基于OpenImages和Visualimages and Visual-ebsositore（〜500 consperies）（〜500 conspers）（〜500概念）（〜500 conspers）（〜500）（〜500 conspers）（〜500概念）。我们还提出了一种新的体系结构GPV-2，该架构支持各种任务 - 从分类和本地化等视觉任务到Qu+语言任务（例如QA和字幕），再到更多的利基市场，例如人类对象互动检测。在这些基准测试中，GPV-2从Web数据中受益匪浅，并且优于GPV-1和VL-T5。我们的数据，代码和Web演示可在https://prior.allenai.org/projects/gpv2上找到。

General Purpose Vision (GPV) systems are models that are designed to solve a wide array of visual tasks without requiring architectural changes. Today, GPVs primarily learn both skills and concepts from large fully supervised datasets. Scaling GPVs to tens of thousands of concepts by acquiring data to learn each concept for every skill quickly becomes prohibitive. This work presents an effective and inexpensive alternative: learn skills from supervised datasets, learn concepts from web image search, and leverage a key characteristic of GPVs: the ability to transfer visual knowledge across skills. We use a dataset of 1M+ images spanning 10k+ visual concepts to demonstrate webly-supervised concept expansion for two existing GPVs (GPV-1 and VL-T5) on 3 benchmarks: 5 COCO-based datasets (80 primary concepts), a newly curated series of 5 datasets based on the OpenImages and VisualGenome repositories (~500 concepts), and the Web-derived dataset (10k+ concepts). We also propose a new architecture, GPV-2 that supports a variety of tasks -- from vision tasks like classification and localization to vision+language tasks like QA and captioning, to more niche ones like human-object interaction detection. GPV-2 benefits hugely from web data and outperforms GPV-1 and VL-T5 across these benchmarks. Our data, code, and web demo are available at https://prior.allenai.org/projects/gpv2.

下载PDF全文

下载文献需遵守相关版权规定

论文标题