频域图像翻译：更多逼真的，更好的身份保护

论文标题

频域图像翻译：更多逼真的，更好的身份保护

Frequency Domain Image Translation: More Photo-realistic, Better Identity-preserving

论文作者

Cai, Mu, Zhang, Hong, Huang, Huijuan, Geng, Qichuan, Li, Yixuan, Huang, Gao

论文摘要

图像到图像翻译已通过基于GAN的方法进行了革新。但是，现有方法缺乏保留源域名的能力。结果，合成的图像通常可以过度适应参考域，失去重要的结构特征和次优质量的折磨。为了解决这些挑战，我们提出了一个新颖的频域图像翻译（FDIT）框架，利用频率信息来增强图像生成过程。我们的关键想法是将图像分解为低频和高频组件，其中高频功能捕获了类似于身份的对象结构。我们的训练目标有助于保存像素空间和傅立叶光谱空间中的频率信息。我们广泛评估了五个大型数据集和多个任务（包括图像翻译和GAN倒置）的FDIT。广泛的实验和消融表明，FDIT有效地保留了源图像的身份，并产生了照片真实的图像。 FDIT建立了最先进的表现，与以前的最佳方法相比，平均FID得分降低了5.6％。

Image-to-image translation has been revolutionized with GAN-based methods. However, existing methods lack the ability to preserve the identity of the source domain. As a result, synthesized images can often over-adapt to the reference domain, losing important structural characteristics and suffering from suboptimal visual quality. To solve these challenges, we propose a novel frequency domain image translation (FDIT) framework, exploiting frequency information for enhancing the image generation process. Our key idea is to decompose the image into low-frequency and high-frequency components, where the high-frequency feature captures object structure akin to the identity. Our training objective facilitates the preservation of frequency information in both pixel space and Fourier spectral space. We broadly evaluate FDIT across five large-scale datasets and multiple tasks including image translation and GAN inversion. Extensive experiments and ablations show that FDIT effectively preserves the identity of the source image, and produces photo-realistic images. FDIT establishes state-of-the-art performance, reducing the average FID score by 5.6% compared to the previous best method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题