使用脱氧扩散概率模型的多模式先验的图像产生

论文标题

使用脱氧扩散概率模型的多模式先验的图像产生

Image Generation with Multimodal Priors using Denoising Diffusion Probabilistic Models

论文作者

Nair, Nithin Gopalakrishnan, Bandara, Wele Gedara Chaminda, Patel, Vishal M

论文摘要

多模式先验下的图像合成是一项有用且具有挑战性的任务，近年来受到了越来越多的关注。使用生成模型来完成此任务的一个主要挑战是缺乏包含所有模式（即先验）和相应输出的配对数据。在最近的工作中，对各种自动编码器（VAE）模型进行了较弱的监督方式培训以应对这一挑战。由于VAE的生成能力通常受到限制，因此该方法很难合成属于复杂分布的图像。为此，我们提出了一个基于脱氧扩散概率模型的解决方案，以在多模型先验下合成图像。基于以下事实：扩散模型中的每个时间步长的分布是高斯，在这项工作中，我们表明对生成图像的封闭形式表达式对应于给定的模态。所提出的解决方案不需要所有模式的明确重试，并且可以根据不同的约束来利用各个模式的输出来生成逼真的图像。我们对两个现实世界数据集进行研究，以证明我们的方法的有效性

Image synthesis under multi-modal priors is a useful and challenging task that has received increasing attention in recent years. A major challenge in using generative models to accomplish this task is the lack of paired data containing all modalities (i.e. priors) and corresponding outputs. In recent work, a variational auto-encoder (VAE) model was trained in a weakly supervised manner to address this challenge. Since the generative power of VAEs is usually limited, it is difficult for this method to synthesize images belonging to complex distributions. To this end, we propose a solution based on a denoising diffusion probabilistic models to synthesise images under multi-model priors. Based on the fact that the distribution over each time step in the diffusion model is Gaussian, in this work we show that there exists a closed-form expression to the generate the image corresponds to the given modalities. The proposed solution does not require explicit retraining for all modalities and can leverage the outputs of individual modalities to generate realistic images according to different constraints. We conduct studies on two real-world datasets to demonstrate the effectiveness of our approach

下载PDF全文

下载文献需遵守相关版权规定

论文标题