ADMA-GAN：属性驱动的内存增强文本到图像生成的gan

论文标题

ADMA-GAN：属性驱动的内存增强文本到图像生成的gan

Adma-GAN: Attribute-Driven Memory Augmented GANs for Text-to-Image Generation

论文作者

Wu, Xintian, Zhao, Hanbin, Zheng, Liangli, Ding, Shouhong, Li, Xi

论文摘要

作为一项具有挑战性的任务，文本到图像生成旨在根据给定的文本说明生成照片真实和语义一致的图像。现有方法主要从一个句子中提取文本信息，以表示图像，文本表示会影响生成的图像的质量。但是，直接利用一个句子中的有限信息错过了一些关键属性描述，这是准确描述图像的关键因素。为了减轻上述问题，我们提出了一种有效的文本表示方法，并具有属性信息的补充。首先，我们构建一个属性内存，以用句子输入共同控制文本对图像生成。其次，我们探讨了两种更新机制，即样本感知和样本 - 关节机制，以动态优化广义属性存储器。此外，我们设计了一个属性句子连接条件生成器学习方案，以使多个表示的特征嵌入对齐，从而促进跨模式网络训练。实验结果表明，该提出的方法对CUB（FID从14.81到8.57）和可可（FID从21.42到12.39）的数据集获得了实质性改进。

As a challenging task, text-to-image generation aims to generate photo-realistic and semantically consistent images according to the given text descriptions. Existing methods mainly extract the text information from only one sentence to represent an image and the text representation effects the quality of the generated image well. However, directly utilizing the limited information in one sentence misses some key attribute descriptions, which are the crucial factors to describe an image accurately. To alleviate the above problem, we propose an effective text representation method with the complements of attribute information. Firstly, we construct an attribute memory to jointly control the text-to-image generation with sentence input. Secondly, we explore two update mechanisms, sample-aware and sample-joint mechanisms, to dynamically optimize a generalized attribute memory. Furthermore, we design an attribute-sentence-joint conditional generator learning scheme to align the feature embeddings among multiple representations, which promotes the cross-modal network training. Experimental results illustrate that the proposed method obtains substantial performance improvements on both the CUB (FID from 14.81 to 8.57) and COCO (FID from 21.42 to 12.39) datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题