文本为神经操作员：通过文本指令进行图像操作

论文标题

文本为神经操作员：通过文本指令进行图像操作

Text as Neural Operator: Image Manipulation by Text Instruction

论文作者

Zhang, Tianhao, Tseng, Hung-Yu, Jiang, Lu, Yang, Weilong, Lee, Honglak, Essa, Irfan

论文摘要

近年来，在多媒体和计算机视觉社区中，文本引导的图像操纵引起了人们的关注。条件图像生成的输入已经从仅图像到多模态发展。在本文中，我们研究了一个设置，该设置允许用户使用复杂的文本指令使用多个对象编辑图像，以添加，删除或更改对象。任务的输入是多模式的，包括（1）参考图像和（2）自然语言的指令，描述了对图像的所需修改。我们提出了一种基于GAN的方法来解决此问题。关键思想是将文本视为神经操作员，以局部修改图像功能。我们表明，所提出的模型对三个公共数据集的最新基本线的表现非常好。具体而言，它会生成更大的忠诚度和语义相关性的图像，并且当用作图像查询时，会导致更好的检索性能。

In recent years, text-guided image manipulation has gained increasing attention in the multimedia and computer vision community. The input to conditional image generation has evolved from image-only to multimodality. In this paper, we study a setting that allows users to edit an image with multiple objects using complex text instructions to add, remove, or change the objects. The inputs of the task are multimodal including (1) a reference image and (2) an instruction in natural language that describes desired modifications to the image. We propose a GAN-based method to tackle this problem. The key idea is to treat text as neural operators to locally modify the image feature. We show that the proposed model performs favorably against recent strong baselines on three public datasets. Specifically, it generates images of greater fidelity and semantic relevance, and when used as a image query, leads to better retrieval performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题