论文标题
VAE通过局部几何形状的镜头的对抗性鲁棒性
Adversarial robustness of VAEs through the lens of local geometry
论文作者
论文摘要
在对变分自动编码器(VAE)的无监督攻击中,对手在输入样本中发现了一个小的扰动,从而显着改变了其潜在空间的编码,从而损害了固定解码器的重建。这种脆弱性的一个已知原因是,由于近似潜在的后部和先前的分布之间的不匹配而导致潜在空间的扭曲。因此,输入样本的略有变化可以将其编码移至潜在空间中的低/零密度区域,从而导致不受约束的产生。本文表明,对手攻击VAE的最佳方法是利用由编码器和解码器网络引起的随机回调度量张量的定向偏置。编码器的回调度量张量可以测量从输入到潜在空间的无穷潜在体积的变化。因此,可以将其视为分析导致潜在空间扭曲的输入扰动效果的镜头。我们建议使用回调度量张量的特征性评估得分。此外,我们从经验上表明,得分与$β-$ vae的鲁棒性参数$β$相关。由于增加了$β$也降低了重建质量,因此我们使用\ textit {Mixup}训练演示了一种简单的替代方案,以填补潜在空间中的空区域,从而改善了重建的稳健性。
In an unsupervised attack on variational autoencoders (VAEs), an adversary finds a small perturbation in an input sample that significantly changes its latent space encoding, thereby compromising the reconstruction for a fixed decoder. A known reason for such vulnerability is the distortions in the latent space resulting from a mismatch between approximated latent posterior and a prior distribution. Consequently, a slight change in an input sample can move its encoding to a low/zero density region in the latent space resulting in an unconstrained generation. This paper demonstrates that an optimal way for an adversary to attack VAEs is to exploit a directional bias of a stochastic pullback metric tensor induced by the encoder and decoder networks. The pullback metric tensor of an encoder measures the change in infinitesimal latent volume from an input to a latent space. Thus, it can be viewed as a lens to analyse the effect of input perturbations leading to latent space distortions. We propose robustness evaluation scores using the eigenspectrum of a pullback metric tensor. Moreover, we empirically show that the scores correlate with the robustness parameter $β$ of the $β-$VAE. Since increasing $β$ also degrades reconstruction quality, we demonstrate a simple alternative using \textit{mixup} training to fill the empty regions in the latent space, thus improving robustness with improved reconstruction.