Gail-PT：具有生成对抗性模仿学习的通用智能渗透测试框架

论文标题

Gail-PT：具有生成对抗性模仿学习的通用智能渗透测试框架

GAIL-PT: A Generic Intelligent Penetration Testing Framework with Generative Adversarial Imitation Learning

论文作者

Chen, Jinyin, Hu, Shulong, Zheng, Haibin, Xing, Changyou, Zhang, Guomin

论文摘要

渗透测试（PT）是一种有效的网络测试和漏洞挖掘工具，通过模拟黑客的攻击，以获取在某些领域应用的有价值信息。与手动PT相比，智能PT由于耗时和较低的人工成本而成为主流的主流主流。不幸的是，基于RL的PT在实际剥削方案中仍然受到挑战，因为代理的动作空间通常是高维离散的，因此导致算法收敛难度。此外，大多数PT方法仍然依赖安全专家的决定。首次解决挑战，我们介绍了专家知识，以指导代理商在基于RL的PT中做出更好的决策，并提出一种基于基于GAIL-PT的基于基于基于的基于基于的基于智力的智能智能渗透测试框架（表示为GAIL-PT），以解决由于安全专家的互动和高维专家的涉及较高的劳动力成本问题。具体而言，首先，我们手动收集国家行动对以构建专家知识库时，当预先培训的RL / DRL模型执行成功的渗透测试。其次，我们将不同的RL / DRL模型在线生成的专家知识和州行动对输入盖尔的歧视者进行培训。最后，我们运用歧视者的输出奖励来指导代理商以更高的渗透成功率执行动作以提高PT的性能。在真实目标主机和模拟网络方案上进行的广泛实验表明，盖尔-PT在利用实际目标metasplooble2和Q-Learning方面实现了SOTA渗透性能，在优化渗透路径方面，不仅在没有或没有蜂蜜网络环境的情况下，而且在大型蜂蜜网络环境中，而且在大尺度上，而且在大尺度的虚拟网络环境中，还可以实现SOTA渗透率。

Penetration testing (PT) is an efficient network testing and vulnerability mining tool by simulating a hacker's attack for valuable information applied in some areas. Compared with manual PT, intelligent PT has become a dominating mainstream due to less time-consuming and lower labor costs. Unfortunately, RL-based PT is still challenged in real exploitation scenarios because the agent's action space is usually high-dimensional discrete, thus leading to algorithm convergence difficulty. Besides, most PT methods still rely on the decisions of security experts. Addressing the challenges, for the first time, we introduce expert knowledge to guide the agent to make better decisions in RL-based PT and propose a Generative Adversarial Imitation Learning-based generic intelligent Penetration testing framework, denoted as GAIL-PT, to solve the problems of higher labor costs due to the involvement of security experts and high-dimensional discrete action space. Specifically, first, we manually collect the state-action pairs to construct an expert knowledge base when the pre-trained RL / DRL model executes successful penetration testings. Second, we input the expert knowledge and the state-action pairs generated online by the different RL / DRL models into the discriminator of GAIL for training. At last, we apply the output reward of the discriminator to guide the agent to perform the action with a higher penetration success rate to improve PT's performance. Extensive experiments conducted on the real target host and simulated network scenarios show that GAIL-PT achieves the SOTA penetration performance against DeepExploit in exploiting actual target Metasploitable2 and Q-learning in optimizing penetration path, not only in small-scale with or without honey-pot network environments but also in the large-scale virtual network environment.

下载PDF全文

下载文献需遵守相关版权规定

论文标题