奖励对深度强化学习的延迟攻击

论文标题

奖励对深度强化学习的延迟攻击

Reward Delay Attacks on Deep Reinforcement Learning

论文作者

Sarkar, Anindya, Feng, Jiarui, Vorobeychik, Yevgeniy, Gill, Christopher, Zhang, Ning

论文摘要

大多数强化学习算法都隐含地假设强同步。我们提出了针对Q学习的新颖攻击，该攻击通过延迟有限时间段的奖励信号来利用该假设所带来的漏洞。我们考虑了两种类型的攻击目标：目标攻击，旨在使目标政策被学习，而无目标的攻击只是旨在诱使奖励低的政策。我们通过一系列实验评估了提出的攻击的功效。我们的第一个观察结果是，当目标仅仅是为了最大程度地减少奖励时，奖励延迟攻击非常有效。的确，我们发现即使是天真的基线奖励 - 延迟攻击也在最大程度地减少奖励方面也非常成功。另一方面，有针对性的攻击更具挑战性，尽管我们仍然证明，所提出的方法在实现攻击者的目标方面仍然非常有效。此外，我们引入了第二个威胁模型，该模型捕获了一种最小的缓解措施，该模型可确保无法依次使用奖励。我们发现，这种缓解措施仍然不足以确保稳定性延迟但保留奖励的顺序。

Most reinforcement learning algorithms implicitly assume strong synchrony. We present novel attacks targeting Q-learning that exploit a vulnerability entailed by this assumption by delaying the reward signal for a limited time period. We consider two types of attack goals: targeted attacks, which aim to cause a target policy to be learned, and untargeted attacks, which simply aim to induce a policy with a low reward. We evaluate the efficacy of the proposed attacks through a series of experiments. Our first observation is that reward-delay attacks are extremely effective when the goal is simply to minimize reward. Indeed, we find that even naive baseline reward-delay attacks are also highly successful in minimizing the reward. Targeted attacks, on the other hand, are more challenging, although we nevertheless demonstrate that the proposed approaches remain highly effective at achieving the attacker's targets. In addition, we introduce a second threat model that captures a minimal mitigation that ensures that rewards cannot be used out of sequence. We find that this mitigation remains insufficient to ensure robustness to attacks that delay, but preserve the order, of rewards.

下载PDF全文

下载文献需遵守相关版权规定

论文标题