神经网络中的建筑后门

论文标题

神经网络中的建筑后门

Architectural Backdoors in Neural Networks

论文作者

Bober-Irizar, Mikel, Shumailov, Ilia, Zhao, Yiren, Mullins, Robert, Papernot, Nicolas

论文摘要

机器学习容易受到对抗操作的影响。先前的文献表明，在训练阶段，攻击者可以操纵数据和数据采样程序以控制模型行为。一个共同的攻击目标是种植后门，即迫使受害者模型学会识别只有对手知道的触发因素。在本文中，我们介绍了一类新的后门攻击类，这些攻击隐藏在模型体系结构内，即在用于训练的功能的电感偏置中。这些后门很容易实现，例如，通过为其他人将在不知不觉中重复使用的后式模型体系结构发布开源代码。我们证明，模型架构后门代表了一个真正的威胁，与其他方法不同，可以从头开始进行完整的重新训练。我们将建筑后门背后的主要构建原理（例如输入和输出之间的链接）形式化，并描述对它们的一些可能的保护。我们评估了对不同尺度的计算机视觉基准测试的攻击，并证明在各种培训环境中，潜在的脆弱性普遍存在。

Machine learning is vulnerable to adversarial manipulation. Previous literature has demonstrated that at the training stage attackers can manipulate data and data sampling procedures to control model behaviour. A common attack goal is to plant backdoors i.e. force the victim model to learn to recognise a trigger known only by the adversary. In this paper, we introduce a new class of backdoor attacks that hide inside model architectures i.e. in the inductive bias of the functions used to train. These backdoors are simple to implement, for instance by publishing open-source code for a backdoored model architecture that others will reuse unknowingly. We demonstrate that model architectural backdoors represent a real threat and, unlike other approaches, can survive a complete re-training from scratch. We formalise the main construction principles behind architectural backdoors, such as a link between the input and the output, and describe some possible protections against them. We evaluate our attacks on computer vision benchmarks of different scales and demonstrate the underlying vulnerability is pervasive in a variety of training settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题