论文标题
开放式编辑:开放式图像操纵带有开放式录音带说明
Open-Edit: Open-Domain Image Manipulation with Open-Vocabulary Instructions
论文作者
论文摘要
我们提出了一种名为Open-Edit的新颖算法,这是使用开放式摄影说明进行开放域图像操纵的尝试。考虑到图像域的巨大变化和缺乏培训监督,这是一项具有挑战性的任务。我们的方法利用了在一般图像限制数据集上预测的统一视觉语义嵌入空间,并通过在图像特征图上应用文本引导的矢量算术来操纵嵌入式视觉特征。然后,具有结构的图像解码器从操纵特征图生成操纵的图像。我们进一步提出了一种具有周期矛盾的限制,以实用样品特异性优化方法,以使操纵图像正规化并迫使它们保留源图像的细节。我们的方法在操纵开放式颜色,纹理和高级属性方面为开放域图像的各种场景显示了有希望的结果。
We propose a novel algorithm, named Open-Edit, which is the first attempt on open-domain image manipulation with open-vocabulary instructions. It is a challenging task considering the large variation of image domains and the lack of training supervision. Our approach takes advantage of the unified visual-semantic embedding space pretrained on a general image-caption dataset, and manipulates the embedded visual features by applying text-guided vector arithmetic on the image feature maps. A structure-preserving image decoder then generates the manipulated images from the manipulated feature maps. We further propose an on-the-fly sample-specific optimization approach with cycle-consistency constraints to regularize the manipulated images and force them to preserve details of the source images. Our approach shows promising results in manipulating open-vocabulary color, texture, and high-level attributes for various scenarios of open-domain images.