论文标题

扩散概率模型缩小

Diffusion Probabilistic Model Made Slim

论文作者

Yang, Xingyi, Zhou, Daquan, Feng, Jiashi, Wang, Xinchao

论文摘要

尽管最近取得了令人愉悦的结果,但大规模的计算成本一直是扩散概率模型(DPMS)的长期缺陷,这反过来又大大限制了其在资源有限平台上的应用。但是,先前的有效DPM的方法主要集中在加速测试上,但忽略了它们的巨大复杂性和尺寸。在本文中,我们竭尽全力减轻DPM,同时努力保持其有利的表现。我们首先从划痕训练小型潜在扩散模型(LDM),但观察到合成图像的忠诚度下降。通过彻底的评估,我们发现DPM本质上是与高频产生的偏见,并学会了在不同时间阶段恢复不同的频率组件。这些属性使紧凑型网络无法用准确的高频估计来表示频率动力学。为此,我们引入了Slim DPM的自定义设计,我们将其称为光谱扩散(SD),以进行轻质图像合成。 SD将小波门控在其架构中,以在每个反向步骤中实现频率动态特征提取,并进行频谱感知的蒸馏,以通过基于光谱巨大的tude tudes来逆权加权目标来促进高频恢复。实验结果表明,与一系列条件和无条件图像生成任务上的潜在扩散模型相比,SD可实现8-18X计算复杂度的降低,同时保留竞争性图像保真度。

Despite the recent visually-pleasing results achieved, the massive computational cost has been a long-standing flaw for diffusion probabilistic models (DPMs), which, in turn, greatly limits their applications on resource-limited platforms. Prior methods towards efficient DPM, however, have largely focused on accelerating the testing yet overlooked their huge complexity and sizes. In this paper, we make a dedicated attempt to lighten DPM while striving to preserve its favourable performance. We start by training a small-sized latent diffusion model (LDM) from scratch, but observe a significant fidelity drop in the synthetic images. Through a thorough assessment, we find that DPM is intrinsically biased against high-frequency generation, and learns to recover different frequency components at different time-steps. These properties make compact networks unable to represent frequency dynamics with accurate high-frequency estimation. Towards this end, we introduce a customized design for slim DPM, which we term as Spectral Diffusion (SD), for light-weight image synthesis. SD incorporates wavelet gating in its architecture to enable frequency dynamic feature extraction at every reverse steps, and conducts spectrum-aware distillation to promote high-frequency recovery by inverse weighting the objective based on spectrum magni tudes. Experimental results demonstrate that, SD achieves 8-18x computational complexity reduction as compared to the latent diffusion models on a series of conditional and unconditional image generation tasks while retaining competitive image fidelity.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源