法令：通过耦合转换的精确扩散反转

论文标题

法令：通过耦合转换的精确扩散反转

EDICT: Exact Diffusion Inversion via Coupled Transformations

论文作者

Wallace, Bram, Gokul, Akash, Naik, Nikhil

论文摘要

在馈入扩散过程（称为倒置）时，找到产生输入图像的初始噪声向量是denoising扩散模型（DDMS）的重要问题，并具有用于真实图像编辑的应用。使用反转的真实图像编辑的最新方法使用deno的扩散隐式模型（DDIMS）确定沿沿路径的中间状态确定图像，鉴于原始条件，该路径将遵循。但是，真实图像的DDIM倒置是不稳定的，因为它依赖于局部线性化假设，从而导致错误的传播，从而导致图像重建和内容丢失不正确。为了减轻这些问题，我们提出了通过耦合变换（diCt）的精确扩散反转，这是一种反演方法，可从仿射耦合层中汲取灵感。通过维护两个耦合的噪声向量，这些噪声向量可以在数学上精确地反转真实和模型生成的图像，这些噪声向量用于以交替的方式相互颠倒。使用稳定的扩散（一种最先进的潜在扩散模型），我们证明了法令成功地以高保真度重建了真实的图像。在MS-Coco之类的复杂图像数据集上，Erdict重建显着胜过DDIM，将重建的均方误差提高了两个因子。使用从真实图像倒置的噪声向量，eDICT可以实现广泛的图像编辑 - 从本地和全局语义编辑到图像样式化 - 同时保持对原始图像结构的保真度。法令不需要模型培训/填充，及时调整或额外的数据，并且可以与任何预验证的DDM结合使用。代码可从https://github.com/salesforce/Edict获得。

Finding an initial noise vector that produces an input image when fed into the diffusion process (known as inversion) is an important problem in denoising diffusion models (DDMs), with applications for real image editing. The state-of-the-art approach for real image editing with inversion uses denoising diffusion implicit models (DDIMs) to deterministically noise the image to the intermediate state along the path that the denoising would follow given the original conditioning. However, DDIM inversion for real images is unstable as it relies on local linearization assumptions, which result in the propagation of errors, leading to incorrect image reconstruction and loss of content. To alleviate these problems, we propose Exact Diffusion Inversion via Coupled Transformations (EDICT), an inversion method that draws inspiration from affine coupling layers. EDICT enables mathematically exact inversion of real and model-generated images by maintaining two coupled noise vectors which are used to invert each other in an alternating fashion. Using Stable Diffusion, a state-of-the-art latent diffusion model, we demonstrate that EDICT successfully reconstructs real images with high fidelity. On complex image datasets like MS-COCO, EDICT reconstruction significantly outperforms DDIM, improving the mean square error of reconstruction by a factor of two. Using noise vectors inverted from real images, EDICT enables a wide range of image edits--from local and global semantic edits to image stylization--while maintaining fidelity to the original image structure. EDICT requires no model training/finetuning, prompt tuning, or extra data and can be combined with any pretrained DDM. Code is available at https://github.com/salesforce/EDICT.

下载PDF全文

下载文献需遵守相关版权规定

论文标题