通过环境中毒的政策教学：针对强化学习的训练时间对抗性攻击

论文标题

通过环境中毒的政策教学：针对强化学习的训练时间对抗性攻击

Policy Teaching via Environment Poisoning: Training-time Adversarial Attacks against Reinforcement Learning

论文作者

Rakhsha, Amin, Radanovic, Goran, Devidze, Rati, Zhu, Xiaojin, Singla, Adish

论文摘要

我们研究了强化学习的安全威胁，攻击者毒害学习环境迫使代理商执行攻击者选择的目标策略。作为受害者，我们认为RL代理人的目的是找到一项政策，该政策在未识别的无限马问题设置中最大化平均奖励。攻击者可以在培训时间时操纵学习环境中的奖励或过渡动态，并有兴趣以隐身的方式进行。我们提出了一个优化框架，以查找\ emph {最佳隐身攻击}，以实现不同的攻击成本。我们提供了足够的技术条件，攻击是可行的，并为攻击成本提供了下限/上限。我们在两种设置中实例化攻击：（i）代理在中毒环境中进行计划的设置\ emph {离线}设置，（ii）\ emph {在线}设置代理商正在使用有毒反馈的遗憾界限使用遗憾的界限。我们的结果表明，攻击者很容易在轻度条件下向受害者传授任何目标政策，并强调对实践中加强学习者的重大安全威胁。

We study a security threat to reinforcement learning where an attacker poisons the learning environment to force the agent into executing a target policy chosen by the attacker. As a victim, we consider RL agents whose objective is to find a policy that maximizes average reward in undiscounted infinite-horizon problem settings. The attacker can manipulate the rewards or the transition dynamics in the learning environment at training-time and is interested in doing so in a stealthy manner. We propose an optimization framework for finding an \emph{optimal stealthy attack} for different measures of attack cost. We provide sufficient technical conditions under which the attack is feasible and provide lower/upper bounds on the attack cost. We instantiate our attacks in two settings: (i) an \emph{offline} setting where the agent is doing planning in the poisoned environment, and (ii) an \emph{online} setting where the agent is learning a policy using a regret-minimization framework with poisoned feedback. Our results show that the attacker can easily succeed in teaching any target policy to the victim under mild conditions and highlight a significant security threat to reinforcement learning agents in practice.

下载PDF全文

下载文献需遵守相关版权规定

论文标题