猎鹰：通过整合图像，语言描述和概念关系来快速视觉概念学习

论文标题

猎鹰：通过整合图像，语言描述和概念关系来快速视觉概念学习

FALCON: Fast Visual Concept Learning by Integrating Images, Linguistic descriptions, and Conceptual Relations

论文作者

Mei, Lingjie, Mao, Jiayuan, Wang, Ziqi, Gan, Chuang, Tenenbaum, Joshua B.

论文摘要

我们提出了一个元学习框架，用于快速学习新的视觉概念，从一个或几个示例中，以多个天然存在的数据流进行指导：同时查看图像，读取描述场景中对象的句子以及解释将新颖概念与其他概念相关的补充句子。博学的概念支持下游应用程序，例如通过推理看不见的图像来回答问题。我们的模型，即Falcon，代表单个视觉概念，例如颜色和形状，是高维空间中的轴对准框（“盒子嵌入空间”）。给定输入图像及其配对句子，我们的模型首先解决了句子中的参考表达，并将新颖概念与场景中的特定对象相关联。接下来，我们的模型将补充句子解释为将新颖概念与其他已知概念（例如“ x hap y hap y”或“ x是一种y”）联系起来。最后，它不嵌入新的概念的最佳盒子，该概念共同1）最大化图像中观察到的实例的可能性，以及2）满足新颖概念与已知概念之间的关系。我们证明了我们的模型对合成数据集和现实数据集的有效性。

We present a meta-learning framework for learning new visual concepts quickly, from just one or a few examples, guided by multiple naturally occurring data streams: simultaneously looking at images, reading sentences that describe the objects in the scene, and interpreting supplemental sentences that relate the novel concept with other concepts. The learned concepts support downstream applications, such as answering questions by reasoning about unseen images. Our model, namely FALCON, represents individual visual concepts, such as colors and shapes, as axis-aligned boxes in a high-dimensional space (the "box embedding space"). Given an input image and its paired sentence, our model first resolves the referential expression in the sentence and associates the novel concept with particular objects in the scene. Next, our model interprets supplemental sentences to relate the novel concept with other known concepts, such as "X has property Y" or "X is a kind of Y". Finally, it infers an optimal box embedding for the novel concept that jointly 1) maximizes the likelihood of the observed instances in the image, and 2) satisfies the relationships between the novel concepts and the known ones. We demonstrate the effectiveness of our model on both synthetic and real-world datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题