论文标题
通过多样化的重量来增强目标攻击转移性
Enhancing Targeted Attack Transferability via Diversified Weight Pruning
论文作者
论文摘要
恶意攻击者可以通过施加微小的噪音来产生目标的对抗示例,从而迫使神经网络产生特定的不正确输出。使用跨模型可传递性,即使在Black-Box设置中,网络模型也仍然易受攻击。最近的研究表明,基于合奏的方法在产生可转移的对抗示例中的有效性。为了进一步增强可传递性,模型增强方法旨在生产更多参与合奏的网络。但是,现有的模型增强方法仅在非目标攻击中被证明有效。在这项工作中,我们提出了多样化的体重修剪(DWP),这是一种新型的模型增强技术,用于产生可转移的目标攻击。 DWP利用模型压缩通常使用的权重修剪方法。与先前的工作相比,DWP可以保护必要的连接并确保同时修剪模型的多样性,我们表明这对于目标可传递性至关重要。在各种且更具挑战性的场景下,在与Imagenet兼容数据集的实验证实了有效性:转移到经过对抗训练的模型,非CNN体系结构和Google Cloud Vision。结果表明,我们拟议的DWP分别提高了目标攻击成功率,最高为$ 10.1 $%,$ 6.6 $%和$ 7.0 $ 7.0 $%的$ 7.0 $%。源代码将在接受后提供。
Malicious attackers can generate targeted adversarial examples by imposing tiny noises, forcing neural networks to produce specific incorrect outputs. With cross-model transferability, network models remain vulnerable even in black-box settings. Recent studies have shown the effectiveness of ensemble-based methods in generating transferable adversarial examples. To further enhance transferability, model augmentation methods aim to produce more networks participating in the ensemble. However, existing model augmentation methods are only proven effective in untargeted attacks. In this work, we propose Diversified Weight Pruning (DWP), a novel model augmentation technique for generating transferable targeted attacks. DWP leverages the weight pruning method commonly used in model compression. Compared with prior work, DWP protects necessary connections and ensures the diversity of the pruned models simultaneously, which we show are crucial for targeted transferability. Experiments on the ImageNet-compatible dataset under various and more challenging scenarios confirm the effectiveness: transferring to adversarially trained models, Non-CNN architectures, and Google Cloud Vision. The results show that our proposed DWP improves the targeted attack success rates with up to $10.1$%, $6.6$%, and $7.0$% on the combination of state-of-the-art methods, respectively. The source code will be made available after acceptance.