“这是我的独角兽，蓬松的”：个性化冻结的视觉语言表示

论文标题

“这是我的独角兽，蓬松的”：个性化冻结的视觉语言表示

"This is my unicorn, Fluffy": Personalizing frozen vision-language representations

论文作者

Cohen, Niv, Gal, Rinon, Meirom, Eli A., Chechik, Gal, Atzmon, Yuval

论文摘要

在Web规模数据上预测的大型视觉和语言模型提供了对许多V＆L问题无价的表示。但是，目前尚不清楚如何将它们用于以非结构化语言为特定于用户的视觉概念。这个问题来自多个领域，从个性化图像检索到与智能设备的个性化互动。我们介绍了一个新的学习设置，称为个性化视觉和语言（PERVL），并使用两个新的基准数据集来检索和细分用户特定的“个性化“个性化”概念“野外”。在PERVL中，应该独立于下游任务（2）允许经过审计的模型以免费语言来推论它们，并且（3）不需要个性化的负面示例。我们提出了一个用于解决PERVL的体系结构，该体系结构是通过扩展了一个预审计模型的输入词汇，并用新单词嵌入了新的个性化概念。然后，模型可以通过简单地在句子中使用它们来推理它们。我们证明我们的方法从几个示例中学习了个性化的视觉概念，并且可以使用丰富的文本查询有效地将它们应用于图像检索和语义细分中。

Large Vision & Language models pretrained on web-scale data provide representations that are invaluable for numerous V&L problems. However, it is unclear how they can be used for reasoning about user-specific visual concepts in unstructured language. This problem arises in multiple domains, from personalized image retrieval to personalized interaction with smart devices. We introduce a new learning setup called Personalized Vision & Language (PerVL) with two new benchmark datasets for retrieving and segmenting user-specific "personalized" concepts "in the wild". In PerVL, one should learn personalized concepts (1) independently of the downstream task (2) allowing a pretrained model to reason about them with free language, and (3) does not require personalized negative examples. We propose an architecture for solving PerVL that operates by extending the input vocabulary of a pretrained model with new word embeddings for the new personalized concepts. The model can then reason about them by simply using them in a sentence. We demonstrate that our approach learns personalized visual concepts from a few examples and can effectively apply them in image retrieval and semantic segmentation using rich textual queries.

下载PDF全文

下载文献需遵守相关版权规定

论文标题