论文标题
通过环境中毒的政策教学:针对强化学习的训练时间对抗性攻击
Policy Teaching via Environment Poisoning: Training-time Adversarial Attacks against Reinforcement Learning
论文作者
论文摘要
我们研究了强化学习的安全威胁,攻击者毒害学习环境迫使代理商执行攻击者选择的目标策略。作为受害者,我们认为RL代理人的目的是找到一项政策,该政策在未识别的无限马问题设置中最大化平均奖励。攻击者可以在培训时间时操纵学习环境中的奖励或过渡动态,并有兴趣以隐身的方式进行。我们提出了一个优化框架,以查找\ emph {最佳隐身攻击},以实现不同的攻击成本。我们提供了足够的技术条件,攻击是可行的,并为攻击成本提供了下限/上限。我们在两种设置中实例化攻击:(i)代理在中毒环境中进行计划的设置\ emph {离线}设置,(ii)\ emph {在线}设置代理商正在使用有毒反馈的遗憾界限使用遗憾的界限。我们的结果表明,攻击者很容易在轻度条件下向受害者传授任何目标政策,并强调对实践中加强学习者的重大安全威胁。
We study a security threat to reinforcement learning where an attacker poisons the learning environment to force the agent into executing a target policy chosen by the attacker. As a victim, we consider RL agents whose objective is to find a policy that maximizes average reward in undiscounted infinite-horizon problem settings. The attacker can manipulate the rewards or the transition dynamics in the learning environment at training-time and is interested in doing so in a stealthy manner. We propose an optimization framework for finding an \emph{optimal stealthy attack} for different measures of attack cost. We provide sufficient technical conditions under which the attack is feasible and provide lower/upper bounds on the attack cost. We instantiate our attacks in two settings: (i) an \emph{offline} setting where the agent is doing planning in the poisoned environment, and (ii) an \emph{online} setting where the agent is learning a policy using a regret-minimization framework with poisoned feedback. Our results show that the attacker can easily succeed in teaching any target policy to the victim under mild conditions and highlight a significant security threat to reinforcement learning agents in practice.