扩散概率模型缩小

论文标题

扩散概率模型缩小

Diffusion Probabilistic Model Made Slim

论文作者

Yang, Xingyi, Zhou, Daquan, Feng, Jiashi, Wang, Xinchao

论文摘要

尽管最近取得了令人愉悦的结果，但大规模的计算成本一直是扩散概率模型（DPMS）的长期缺陷，这反过来又大大限制了其在资源有限平台上的应用。但是，先前的有效DPM的方法主要集中在加速测试上，但忽略了它们的巨大复杂性和尺寸。在本文中，我们竭尽全力减轻DPM，同时努力保持其有利的表现。我们首先从划痕训练小型潜在扩散模型（LDM），但观察到合成图像的忠诚度下降。通过彻底的评估，我们发现DPM本质上是与高频产生的偏见，并学会了在不同时间阶段恢复不同的频率组件。这些属性使紧凑型网络无法用准确的高频估计来表示频率动力学。为此，我们引入了Slim DPM的自定义设计，我们将其称为光谱扩散（SD），以进行轻质图像合成。 SD将小波门控在其架构中，以在每个反向步骤中实现频率动态特征提取，并进行频谱感知的蒸馏，以通过基于光谱巨大的tude tudes来逆权加权目标来促进高频恢复。实验结果表明，与一系列条件和无条件图像生成任务上的潜在扩散模型相比，SD可实现8-18X计算复杂度的降低，同时保留竞争性图像保真度。

Despite the recent visually-pleasing results achieved, the massive computational cost has been a long-standing flaw for diffusion probabilistic models (DPMs), which, in turn, greatly limits their applications on resource-limited platforms. Prior methods towards efficient DPM, however, have largely focused on accelerating the testing yet overlooked their huge complexity and sizes. In this paper, we make a dedicated attempt to lighten DPM while striving to preserve its favourable performance. We start by training a small-sized latent diffusion model (LDM) from scratch, but observe a significant fidelity drop in the synthetic images. Through a thorough assessment, we find that DPM is intrinsically biased against high-frequency generation, and learns to recover different frequency components at different time-steps. These properties make compact networks unable to represent frequency dynamics with accurate high-frequency estimation. Towards this end, we introduce a customized design for slim DPM, which we term as Spectral Diffusion (SD), for light-weight image synthesis. SD incorporates wavelet gating in its architecture to enable frequency dynamic feature extraction at every reverse steps, and conducts spectrum-aware distillation to promote high-frequency recovery by inverse weighting the objective based on spectrum magni tudes. Experimental results demonstrate that, SD achieves 8-18x computational complexity reduction as compared to the latent diffusion models on a series of conditional and unconditional image generation tasks while retaining competitive image fidelity.

下载PDF全文

下载文献需遵守相关版权规定

论文标题