具有令牌评论的改进的蒙版图像产生

论文标题

具有令牌评论的改进的蒙版图像产生

Improved Masked Image Generation with Token-Critic

论文作者

Lezama, José, Chang, Huiwen, Jiang, Lu, Essa, Irfan

论文摘要

非自动入射的生成变压器最近表现出令人印象深刻的图像产生性能，并且比自动回归对应物更快地采样了。但是，从视觉令牌的真实关节分布中进行的最佳并行采样仍然是一个开放的挑战。在本文中，我们介绍了代币 - 一种辅助模型，用于指导非自动产生变压器的采样。鉴于掩盖和重建的真实图像，令牌批评的模型经过训练，以区分哪种视觉令牌属于原始图像，哪些是由生成变压器采样的。在非自动回归迭代采样期间，令牌批评者用于选择要接受的代币以及拒绝和重新取样的代币。再加上最先进的生成变压器令牌 - 批判性可显着提高其性能，并且在挑战性的课堂条件构成影像网生成中，就产生的图像质量和多样性之间的权衡取舍了最近的扩散模型和gan。

Non-autoregressive generative transformers recently demonstrated impressive image generation performance, and orders of magnitude faster sampling than their autoregressive counterparts. However, optimal parallel sampling from the true joint distribution of visual tokens remains an open challenge. In this paper we introduce Token-Critic, an auxiliary model to guide the sampling of a non-autoregressive generative transformer. Given a masked-and-reconstructed real image, the Token-Critic model is trained to distinguish which visual tokens belong to the original image and which were sampled by the generative transformer. During non-autoregressive iterative sampling, Token-Critic is used to select which tokens to accept and which to reject and resample. Coupled with Token-Critic, a state-of-the-art generative transformer significantly improves its performance, and outperforms recent diffusion models and GANs in terms of the trade-off between generated image quality and diversity, in the challenging class-conditional ImageNet generation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题