用于使用引导扩散模型编辑真实图像的null文本倒置

论文标题

用于使用引导扩散模型编辑真实图像的null文本倒置

Null-text Inversion for Editing Real Images using Guided Diffusion Models

论文作者

Mokady, Ron, Hertz, Amir, Aberman, Kfir, Pritch, Yael, Cohen-Or, Daniel

论文摘要

最近的文本引导的扩散模型提供了强大的图像生成功能。当前，为仅使用文本作为提供直观且通用的编辑的手段来启用这些图像的修改。要使用这些最先进的工具编辑真实映像，必须首先用有意义的文本提示将图像倒入验证的模型域。在本文中，我们引入了一种准确的反转技术，从而促进了对图像的直观基于文本的修改。我们提出的反演由两个新的关键组成部分组成：（i）扩散模型的关键反演。虽然当前的方法旨在将随机噪声样本映射到单个输入图像中，但我们为每个时间戳使用单个关键噪声向量并围绕它进行优化。我们证明直接反转是不足的，但确实为我们的优化提供了良好的锚点。（ii）NULL-TEXT优化，其中我们仅修改用于无分类器指导的无条件文本嵌入，而不是输入文本嵌入。这使得保持模型权重和有条件的嵌入完整，因此可以在避免繁琐的模型重量调整的同时应用基于及时的编辑。我们基于公开稳定扩散模型的NULL文本反转，对各种图像进行了广泛的评估并及时编辑，显示了真实图像的高保真编辑。

Recent text-guided diffusion models provide powerful image generation capabilities. Currently, a massive effort is given to enable the modification of these images using text only as means to offer intuitive and versatile editing. To edit a real image using these state-of-the-art tools, one must first invert the image with a meaningful text prompt into the pretrained model's domain. In this paper, we introduce an accurate inversion technique and thus facilitate an intuitive text-based modification of the image. Our proposed inversion consists of two novel key components: (i) Pivotal inversion for diffusion models. While current methods aim at mapping random noise samples to a single input image, we use a single pivotal noise vector for each timestamp and optimize around it. We demonstrate that a direct inversion is inadequate on its own, but does provide a good anchor for our optimization. (ii) NULL-text optimization, where we only modify the unconditional textual embedding that is used for classifier-free guidance, rather than the input text embedding. This allows for keeping both the model weights and the conditional embedding intact and hence enables applying prompt-based editing while avoiding the cumbersome tuning of the model's weights. Our Null-text inversion, based on the publicly available Stable Diffusion model, is extensively evaluated on a variety of images and prompt editing, showing high-fidelity editing of real images.

下载PDF全文

下载文献需遵守相关版权规定

论文标题