论文标题

政策策略:后门检测和缓解加固学习

PolicyCleanse: Backdoor Detection and Mitigation in Reinforcement Learning

论文作者

Guo, Junfeng, Li, Ang, Liu, Cong

论文摘要

尽管增强学习的现实应用程序变得流行,但RL系统的安全性和鲁棒性值得更多的关注和探索。特别是,最近的作品表明,在多代理RL环境中,可以将后门触发动作注入受害者(又称Trojan特工),这可能会在看到后门触发作用后立即导致灾难性故障。为了确保RL代理免受恶意后门的安全性,在这项工作中,我们提出了在多机构的竞争增强学习系统中的后门检测问题,目的是检测特洛伊木马代理以及相应的潜在触发动作,并进一步尝试缓解其特洛伊人的行为。为了解决这个问题,我们提出了基于激活的特洛伊木马代理人在几个时间段之后明显降低奖励降低的策略策略。与PolitionCleanse一起,我们还设计了一种基于机器的方法,可以有效地减轻检测到的后门。广泛的实验表明,所提出的方法可以准确地检测特洛伊木马的代理,并且在各种类型的代理和环境中,胜过现有的后门缓解基线方法的获胜率至少为3%。

While real-world applications of reinforcement learning are becoming popular, the security and robustness of RL systems are worthy of more attention and exploration. In particular, recent works have revealed that, in a multi-agent RL environment, backdoor trigger actions can be injected into a victim agent (a.k.a. Trojan agent), which can result in a catastrophic failure as soon as it sees the backdoor trigger action. To ensure the security of RL agents against malicious backdoors, in this work, we propose the problem of Backdoor Detection in a multi-agent competitive reinforcement learning system, with the objective of detecting Trojan agents as well as the corresponding potential trigger actions, and further trying to mitigate their Trojan behavior. In order to solve this problem, we propose PolicyCleanse that is based on the property that the activated Trojan agents accumulated rewards degrade noticeably after several timesteps. Along with PolicyCleanse, we also design a machine unlearning-based approach that can effectively mitigate the detected backdoor. Extensive experiments demonstrate that the proposed methods can accurately detect Trojan agents, and outperform existing backdoor mitigation baseline approaches by at least 3% in winning rate across various types of agents and environments.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源