论文标题
通过Benford-Fourier系数对对抗性示例有效检测
Effective and Robust Detection of Adversarial Examples via Benford-Fourier Coefficients
论文作者
论文摘要
对抗性的例子众所周知,这是对深神经网络(DNN)的严重威胁。在这项工作中,我们基于以下假设研究了对抗性示例的检测,即对对抗和良性示例的一个DNN模型的输出和内部响应遵循广义高斯分布(GGD),但具有不同的参数(即形状因子,平均值,均值和方差)。 GGD是一个涵盖许多流行分布(例如Laplacian,Gaussian或统一)的一般分布家族。与任何特定分布相比,它更有可能近似内部响应的内在分布。此外,由于形状因子对于不同的数据库而不是其他两个参数更强大,因此我们建议通过使用Benford-fourier系数(MBF)的大小来构建歧视性特征,可以使用响应轻松估算。最后,通过利用MBF功能,将支持向量机作为对抗检测器进行训练。在图像分类方面进行了广泛的实验表明,与最先进的对抗性检测方法相比,提出的检测器在检测不同的手工制作方法和不同来源的对抗示例方面更加有效和强大。
Adversarial examples have been well known as a serious threat to deep neural networks (DNNs). In this work, we study the detection of adversarial examples, based on the assumption that the output and internal responses of one DNN model for both adversarial and benign examples follow the generalized Gaussian distribution (GGD), but with different parameters (i.e., shape factor, mean, and variance). GGD is a general distribution family to cover many popular distributions (e.g., Laplacian, Gaussian, or uniform). It is more likely to approximate the intrinsic distributions of internal responses than any specific distribution. Besides, since the shape factor is more robust to different databases rather than the other two parameters, we propose to construct discriminative features via the shape factor for adversarial detection, employing the magnitude of Benford-Fourier coefficients (MBF), which can be easily estimated using responses. Finally, a support vector machine is trained as the adversarial detector through leveraging the MBF features. Extensive experiments in terms of image classification demonstrate that the proposed detector is much more effective and robust on detecting adversarial examples of different crafting methods and different sources, compared to state-of-the-art adversarial detection methods.