对多工程协作系统的后门攻击

论文标题

对多工程协作系统的后门攻击

Backdoor Attacks on Multiagent Collaborative Systems

论文作者

Chen, Shuo, Qiu, Yue, Zhang, Jie

论文摘要

对强化学习的后门攻击将植入受害者特工政策中的后门。受害者观察到触发信号后，它将切换到异常模式并失败其任务。大多数攻击都认为对手可以任意修改受害者的观察结果，这可能是不切实际的。一项工作建议让一个对手代理人使用其行动来影响其在两场竞争游戏中的对手，以便在观察某些触发动作后，对手很快就会失败。但是，在多种协作系统中，代理可能并不总是能够观察他人。对手代理会何时以及多少影响他人，我们希望对手代理尽可能少地触发他人。为了解决这个问题，我们首先设计了一个新颖的培训框架，以产生辅助奖励，以衡量其他代理人的影响。然后，我们使用辅助奖励来训练触发策略，使对手代理有效地影响他人的观察结果。鉴于这些受影响的观察结果，我们进一步训练其他代理人异常表现。广泛的实验表明，所提出的方法使对手的代理只能以少量作用吸引其他方法进入异常模式。

Backdoor attacks on reinforcement learning implant a backdoor in a victim agent's policy. Once the victim observes the trigger signal, it will switch to the abnormal mode and fail its task. Most of the attacks assume the adversary can arbitrarily modify the victim's observations, which may not be practical. One work proposes to let one adversary agent use its actions to affect its opponent in two-agent competitive games, so that the opponent quickly fails after observing certain trigger actions. However, in multiagent collaborative systems, agents may not always be able to observe others. When and how much the adversary agent can affect others are uncertain, and we want the adversary agent to trigger others for as few times as possible. To solve this problem, we first design a novel training framework to produce auxiliary rewards that measure the extent to which the other agents'observations being affected. Then we use the auxiliary rewards to train a trigger policy which enables the adversary agent to efficiently affect the others' observations. Given these affected observations, we further train the other agents to perform abnormally. Extensive experiments demonstrate that the proposed method enables the adversary agent to lure the others into the abnormal mode with only a few actions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题