论文标题
您只需要对对抗性监督语义图像综合
You Only Need Adversarial Supervision for Semantic Image Synthesis
论文作者
论文摘要
尽管他们最近取得了成功,但GAN的语义图像合成模型仅接受对抗性监督而受到训练,但图像质量仍然差。从历史上看,还采用基于VGG的感知损失有助于克服这个问题,从而显着提高了综合质量,但同时限制了GAN模型的语义图像综合的进度。在这项工作中,我们提出了一种新颖的简化GAN模型,该模型仅需要对抗性监督才能获得高质量的结果。我们将歧视器重新设计为语义分割网络,直接使用给定的语义标签图作为训练的基础真相。通过为歧视者以及通过空间和语义感知的歧视器反馈提供更强的监督,以及对发电机的反馈,我们能够综合较高的忠诚度图像,以更好地对齐其输入标签图,从而使感知损失多余。此外,我们通过注入发电机的3D噪声张量的全局和局部采样来启用高质量的多模式图像综合,从而允许完整或部分图像更改。我们表明,我们的模型合成的图像更加多样化,并且更仔细地遵循真实图像的颜色和纹理分布。我们仅使用对抗性监督,在不同数据集中,我们的平均提高$ 6 $ FID和$ 5 $ MIOU的积分。
Despite their recent successes, GAN models for semantic image synthesis still suffer from poor image quality when trained with only adversarial supervision. Historically, additionally employing the VGG-based perceptual loss has helped to overcome this issue, significantly improving the synthesis quality, but at the same time limiting the progress of GAN models for semantic image synthesis. In this work, we propose a novel, simplified GAN model, which needs only adversarial supervision to achieve high quality results. We re-design the discriminator as a semantic segmentation network, directly using the given semantic label maps as the ground truth for training. By providing stronger supervision to the discriminator as well as to the generator through spatially- and semantically-aware discriminator feedback, we are able to synthesize images of higher fidelity with better alignment to their input label maps, making the use of the perceptual loss superfluous. Moreover, we enable high-quality multi-modal image synthesis through global and local sampling of a 3D noise tensor injected into the generator, which allows complete or partial image change. We show that images synthesized by our model are more diverse and follow the color and texture distributions of real images more closely. We achieve an average improvement of $6$ FID and $5$ mIoU points over the state of the art across different datasets using only adversarial supervision.