图像到图像翻译，带有文本指南

论文标题

图像到图像翻译，带有文本指南

Image-to-Image Translation with Text Guidance

论文作者

Li, Bowen, Qi, Xiaojuan, Torr, Philip H. S., Lukasiewicz, Thomas

论文摘要

本文的目的是将可控因素（即自然语言描述）嵌入具有生成对抗性网络的图像到图像翻译中，这允许文本描述确定合成图像的视觉属性。 We propose four key components: (1) the implementation of part-of-speech tagging to filter out non-semantic words in the given description, (2) the adoption of an affine combination module to effectively fuse different modality text and image features, (3) a novel refined multi-stage architecture to strengthen the differential ability of discriminators and the rectification ability of generators, and (4) a new structure loss to further improve discriminators to better distinguish real and合成图像。可可数据集上的广泛实验表明，我们的方法在视觉现实主义和语义一致性上都具有较高的性能。

The goal of this paper is to embed controllable factors, i.e., natural language descriptions, into image-to-image translation with generative adversarial networks, which allows text descriptions to determine the visual attributes of synthetic images. We propose four key components: (1) the implementation of part-of-speech tagging to filter out non-semantic words in the given description, (2) the adoption of an affine combination module to effectively fuse different modality text and image features, (3) a novel refined multi-stage architecture to strengthen the differential ability of discriminators and the rectification ability of generators, and (4) a new structure loss to further improve discriminators to better distinguish real and synthetic images. Extensive experiments on the COCO dataset demonstrate that our method has a superior performance on both visual realism and semantic consistency with given descriptions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题