论文标题
可证明强大的深层生成模型
Provably robust deep generative models
论文作者
论文摘要
对抗性攻击的最新工作已开发出可证明的强大方法来培训深层神经网络分类器。但是,尽管在鲁棒性的背景下经常提到它们,但在正式分析其鲁棒性特性方面,深层生成模型本身受到了相对较少的关注。在本文中,我们提出了一种训练可证明可靠的生成模型的方法,特别是变异自动编码器(VAE)的可证明是可靠的版本。为此,我们首先在可能性的变化下限上正式定义了(证实)强大的下限,然后显示如何在训练过程中优化该界限以产生强大的VAE。我们在简单的示例上评估了该方法,并表明它能够产生对对抗攻击的生成模型(即,试图扰动输入的对手,从而大大降低了模型下的可能性)。
Recent work in adversarial attacks has developed provably robust methods for training deep neural network classifiers. However, although they are often mentioned in the context of robustness, deep generative models themselves have received relatively little attention in terms of formally analyzing their robustness properties. In this paper, we propose a method for training provably robust generative models, specifically a provably robust version of the variational auto-encoder (VAE). To do so, we first formally define a (certifiably) robust lower bound on the variational lower bound of the likelihood, and then show how this bound can be optimized during training to produce a robust VAE. We evaluate the method on simple examples, and show that it is able to produce generative models that are substantially more robust to adversarial attacks (i.e., an adversary trying to perturb inputs so as to drastically lower their likelihood under the model).