神经毒素：联合学习耐用的后门

论文标题

神经毒素：联合学习耐用的后门

Neurotoxin: Durable Backdoors in Federated Learning

论文作者

Zhang, Zhengming, Panda, Ashwinee, Song, Linyue, Yang, Yaoqing, Mahoney, Michael W., Gonzalez, Joseph E., Ramchandran, Kannan, Mittal, Prateek

论文摘要

由于他们的分散性质，联邦学习（FL）系统在对抗后门攻击的训练过程中具有固有的脆弱性。在这种类型的攻击中，攻击者的目的是将中毒的更新用于植入所谓的后门，以便在测试时，该模型的输出可以将其固定在某些输入的给定目标上。（作为一个简单的玩具示例，如果用户将“来自纽约的人”键入使用后门键入的下一个单词预测模型的移动键盘应用程序，则该模型可以将句子自动完成给“纽约的人是粗鲁的”）。先前的工作表明，可以将后门插入FL型号，但是这些后门通常不耐用，即，在攻击者停止上传中毒更新后，它们不会保留在模型中。因此，由于培训通常在生产FL系统中逐步持续，因此在部署之前，插入的后门可能无法生存。在这里，我们提出了神经毒素，这是对现有的后门攻击的简单单行修改，该攻击通过攻击训练期间幅度较小的参数而起作用。我们对十项自然语言处理和计算机视觉任务进行了详尽的评估，我们发现我们可以将最新后门状态的耐用性增加一倍。

Due to their decentralized nature, federated learning (FL) systems have an inherent vulnerability during their training to adversarial backdoor attacks. In this type of attack, the goal of the attacker is to use poisoned updates to implant so-called backdoors into the learned model such that, at test time, the model's outputs can be fixed to a given target for certain inputs. (As a simple toy example, if a user types "people from New York" into a mobile keyboard app that uses a backdoored next word prediction model, then the model could autocomplete the sentence to "people from New York are rude"). Prior work has shown that backdoors can be inserted into FL models, but these backdoors are often not durable, i.e., they do not remain in the model after the attacker stops uploading poisoned updates. Thus, since training typically continues progressively in production FL systems, an inserted backdoor may not survive until deployment. Here, we propose Neurotoxin, a simple one-line modification to existing backdoor attacks that acts by attacking parameters that are changed less in magnitude during training. We conduct an exhaustive evaluation across ten natural language processing and computer vision tasks, and we find that we can double the durability of state of the art backdoors.

下载PDF全文

下载文献需遵守相关版权规定

论文标题