您的VIT秘密地是混合歧视生成扩散模型

论文标题

您的VIT秘密地是混合歧视生成扩散模型

Your ViT is Secretly a Hybrid Discriminative-Generative Diffusion Model

论文作者

Yang, Xiulong, Shih, Sheng-Min, Fu, Yinlin, Zhao, Xiaoting, Ji, Shihao

论文摘要

扩散的降级概率模型（DDPM）和视觉变压器（VIT）分别在生成任务和判别任务中表现出了重大进展，到目前为止，这些模型已在其自身领域中很大程度上开发出来。在本文中，我们通过将VIT体系结构整合到DDPM之间，建立了DDPM和VIT之间的直接联系，并引入了一种新的生成模型，称为Generative Vit（Genvit）。 VIT的建模灵活性使我们能够将Genvit进一步扩展到混合判别生成建模，并引入混合VIT（HYBVIT）。我们的工作是第一个探索单个VIT以共同探索图像生成和分类的人之一。我们进行了一系列实验，以分析提议的模型的性能，并证明它们在生成和歧视性任务中都比先前的最新工具进行了优越性。我们的代码和预培训模型可以在https://github.com/sndnyang/diffusion_vit中找到。

Diffusion Denoising Probability Models (DDPM) and Vision Transformer (ViT) have demonstrated significant progress in generative tasks and discriminative tasks, respectively, and thus far these models have largely been developed in their own domains. In this paper, we establish a direct connection between DDPM and ViT by integrating the ViT architecture into DDPM, and introduce a new generative model called Generative ViT (GenViT). The modeling flexibility of ViT enables us to further extend GenViT to hybrid discriminative-generative modeling, and introduce a Hybrid ViT (HybViT). Our work is among the first to explore a single ViT for image generation and classification jointly. We conduct a series of experiments to analyze the performance of proposed models and demonstrate their superiority over prior state-of-the-arts in both generative and discriminative tasks. Our code and pre-trained models can be found in https://github.com/sndnyang/Diffusion_ViT .

下载PDF全文

下载文献需遵守相关版权规定

论文标题