论文标题
查询有效决策的基于黑盒深度学习模型的稀疏攻击
Query Efficient Decision Based Sparse Attacks Against Black-Box Deep Learning Models
论文作者
论文摘要
尽管我们尽了最大的努力,但深度学习模型仍然非常容易受到适用于输入的微小对抗性扰动的影响。仅从机器学习模型的输出中提取信息来制作对抗性扰动的能力是针对现实世界系统的实用威胁,例如自动驾驶汽车或机器学习模型暴露为服务(MLAAS)。特别感兴趣的是稀疏攻击。黑框模型中稀疏攻击的实现表明,机器学习模型比我们想象的要脆弱。因为这些攻击旨在最大程度地减少L_0测量的扰动像素的数量,以通过仅观察该决定(预测标签)返回模型查询来误导模型;所谓的基于决策的攻击设置。但是,这样的攻击导致了NP-HARD优化问题。我们开发了一个基于进化的算法,可以解决问题,并对卷积深神经网络和视觉变压器进行评估。值得注意的是,在基于决策的攻击环境下,视觉变压器尚未进行研究。对于未靶向和有针对性的攻击,SparseEvo所需的模型查询要比最先进的稀疏攻击少得多。攻击算法虽然在概念上很简单,但在标准计算机视觉任务(例如ImageNet)中的基于最先进的基于梯度的白盒攻击方面也只有有限的查询预算竞争。重要的是,查询有效的SparseEvo以及基于决策的攻击通常提出了有关已部署系统安全性的新问题,并为研究和了解机器学习模型的鲁棒性提出了新的方向。
Despite our best efforts, deep learning models remain highly vulnerable to even tiny adversarial perturbations applied to the inputs. The ability to extract information from solely the output of a machine learning model to craft adversarial perturbations to black-box models is a practical threat against real-world systems, such as autonomous cars or machine learning models exposed as a service (MLaaS). Of particular interest are sparse attacks. The realization of sparse attacks in black-box models demonstrates that machine learning models are more vulnerable than we believe. Because these attacks aim to minimize the number of perturbed pixels measured by l_0 norm-required to mislead a model by solely observing the decision (the predicted label) returned to a model query; the so-called decision-based attack setting. But, such an attack leads to an NP-hard optimization problem. We develop an evolution-based algorithm-SparseEvo-for the problem and evaluate against both convolutional deep neural networks and vision transformers. Notably, vision transformers are yet to be investigated under a decision-based attack setting. SparseEvo requires significantly fewer model queries than the state-of-the-art sparse attack Pointwise for both untargeted and targeted attacks. The attack algorithm, although conceptually simple, is also competitive with only a limited query budget against the state-of-the-art gradient-based whitebox attacks in standard computer vision tasks such as ImageNet. Importantly, the query efficient SparseEvo, along with decision-based attacks, in general, raise new questions regarding the safety of deployed systems and poses new directions to study and understand the robustness of machine learning models.