论文标题
使用01损失神经网络朝对抗鲁棒性
Towards adversarial robustness with 01 loss neural networks
论文作者
论文摘要
由01损失的一般鲁棒性特性的激励,我们提出了一个单个隐藏的层01损耗神经网络,该损失神经网络受到随机坐标下降的训练,以防止机器学习中的对抗性攻击。模型鲁棒性的一种度量是使输入对抗性所需的最小失真。这可以通过边界攻击(Brendel等人,2018年)和Hopskipjump(Chen等人,2019年)方法进行近似。我们将01损耗网络的最小扭曲与二进制的神经网络和标准的Sigmoid激活网络进行了比较,并在CIFAR10基准测试基准的二进制二进制二进制二进制分类的情况下进行了跨透镜损失,在类别和1中。无论是在有和没有噪声训练的情况下,我们都会发现我们的01损失网络都有三个模型的最大差异模型。为了进一步验证这些结果,我们使所有模型在不同的失真阈值下替代模型黑匣子攻击,发现01损耗网络是所有扭曲中最难攻击的。在0.125的失真下,两个乙状结激活的跨透明镜损失和二进制网络在对抗示例上的精度几乎为0%,而01损耗网络为40%。即使01损耗和二元网络使用符号激活都不同,但它们的训练算法也不同,这反过来又为鲁棒性提供了不同的解决方案。最后,我们将网络与替代模型黑匣子攻击下的简单卷积模型进行比较,并发现其准确性是可比的。我们的工作表明,01损失网络有可能比凸损失和二元化网络更好地防御黑匣子对抗攻击。
Motivated by the general robustness properties of the 01 loss we propose a single hidden layer 01 loss neural network trained with stochastic coordinate descent as a defense against adversarial attacks in machine learning. One measure of a model's robustness is the minimum distortion required to make the input adversarial. This can be approximated with the Boundary Attack (Brendel et. al. 2018) and HopSkipJump (Chen et. al. 2019) methods. We compare the minimum distortion of the 01 loss network to the binarized neural network and the standard sigmoid activation network with cross-entropy loss all trained with and without Gaussian noise on the CIFAR10 benchmark binary classification between classes 0 and 1. Both with and without noise training we find our 01 loss network to have the largest adversarial distortion of the three models by non-trivial margins. To further validate these results we subject all models to substitute model black box attacks under different distortion thresholds and find that the 01 loss network is the hardest to attack across all distortions. At a distortion of 0.125 both sigmoid activated cross-entropy loss and binarized networks have almost 0% accuracy on adversarial examples whereas the 01 loss network is at 40%. Even though both 01 loss and the binarized network use sign activations their training algorithms are different which in turn give different solutions for robustness. Finally we compare our network to simple convolutional models under substitute model black box attacks and find their accuracies to be comparable. Our work shows that the 01 loss network has the potential to defend against black box adversarial attacks better than convex loss and binarized networks.