通过强化学习和自我竞争来寻找有效的安全策略

论文标题

通过强化学习和自我竞争来寻找有效的安全策略

Finding Effective Security Strategies through Reinforcement Learning and Self-Play

论文作者

Hammar, Kim, Stadler, Rolf

论文摘要

我们提出了一种自动找到预防入侵用例的安全策略的方法。遵循这种方法，我们将攻击者和防守者之间的相互作用建模为马尔可夫游戏，并让攻击和防御策略通过强化学习和自我播放而不用人类干预而发展。使用简单的基础架构配置，我们证明可以从自我播放中出现有效的安全策略。这表明，在网络安全的背景下，自我播放已在其他领域中应用的自我播放可以有效。对融合政策的检查表明，出现的政策反映了常识性知识，并且类似于人类的策略。此外，我们解决了该领域中强化学习的已知挑战，并提出了一种使用功能近似，对手池和自回归政策表示的方法。通过评估，我们表明我们的方法优于两种基线方法，但是自我播放中的政策融合仍然是一个挑战。

We present a method to automatically find security strategies for the use case of intrusion prevention. Following this method, we model the interaction between an attacker and a defender as a Markov game and let attack and defense strategies evolve through reinforcement learning and self-play without human intervention. Using a simple infrastructure configuration, we demonstrate that effective security strategies can emerge from self-play. This shows that self-play, which has been applied in other domains with great success, can be effective in the context of network security. Inspection of the converged policies show that the emerged policies reflect common-sense knowledge and are similar to strategies of humans. Moreover, we address known challenges of reinforcement learning in this domain and present an approach that uses function approximation, an opponent pool, and an autoregressive policy representation. Through evaluations we show that our method is superior to two baseline methods but that policy convergence in self-play remains a challenge.

下载PDF全文

下载文献需遵守相关版权规定

论文标题