论文标题
要从不确定性的镜头表征深学习软件的对抗缺陷
Towards Characterizing Adversarial Defects of Deep Learning Software from the Lens of Uncertainty
论文作者
论文摘要
在过去的十年中,深度学习(DL)已成功应用于许多特定于工业领域的任务。但是,当前的最新DL软件仍然存在质量问题,这引起了人们的关注,尤其是在安全性和关键安全方案的背景下。对抗性示例(AES)代表需要紧急解决的典型和重要类型的缺陷类型,DL软件在其上做出错误的决定。这种缺陷是通过有意攻击或通过输入传感器感知到的物理世界噪声而发生的,从而有可能阻碍进一步的行业部署。深度学习决策的内在不确定性可能是其不正确行为的基本原因。尽管最近提出了一些测试,对抗性攻击和防御技术,但它仍然缺乏系统的研究来揭示AES与DL不确定性之间的关系。在本文中,我们进行了一项大规模研究,以弥合这一差距。我们首先研究了多个不确定性指标在区分良性示例(BES)和AE中的能力,这使得能够表征输入数据的不确定性模式。然后,我们识别并分类了BES和AE的不确定性模式,发现虽然由现有方法生成的BES和AE确实遵循共同的不确定性模式,但在很大程度上却遗漏了其他一些不确定性模式。基于此,我们提出了一种自动测试技术,以生成多种类型的不常见AE,并且在很大程度上被现有技术遗漏了。我们的进一步评估表明,我们方法产生的不常见数据很难由现有的防御技术辩护,而平均国防成功率降低了35 \%。我们的结果要求关注和必要性,以生成更多样化的数据,以评估DL软件的质量保证解决方案。
Over the past decade, deep learning (DL) has been successfully applied to many industrial domain-specific tasks. However, the current state-of-the-art DL software still suffers from quality issues, which raises great concern especially in the context of safety- and security-critical scenarios. Adversarial examples (AEs) represent a typical and important type of defects needed to be urgently addressed, on which a DL software makes incorrect decisions. Such defects occur through either intentional attack or physical-world noise perceived by input sensors, potentially hindering further industry deployment. The intrinsic uncertainty nature of deep learning decisions can be a fundamental reason for its incorrect behavior. Although some testing, adversarial attack and defense techniques have been recently proposed, it still lacks a systematic study to uncover the relationship between AEs and DL uncertainty. In this paper, we conduct a large-scale study towards bridging this gap. We first investigate the capability of multiple uncertainty metrics in differentiating benign examples (BEs) and AEs, which enables to characterize the uncertainty patterns of input data. Then, we identify and categorize the uncertainty patterns of BEs and AEs, and find that while BEs and AEs generated by existing methods do follow common uncertainty patterns, some other uncertainty patterns are largely missed. Based on this, we propose an automated testing technique to generate multiple types of uncommon AEs and BEs that are largely missed by existing techniques. Our further evaluation reveals that the uncommon data generated by our method is hard to be defended by the existing defense techniques with the average defense success rate reduced by 35\%. Our results call for attention and necessity to generate more diverse data for evaluating quality assurance solutions of DL software.