论文标题

Securebert:网络安全的特定领域语言模型

SecureBERT: A Domain-Specific Language Model for Cybersecurity

论文作者

Aghaei, Ehsan, Niu, Xi, Shadid, Waseem, Al-Shaer, Ehab

论文摘要

自然语言处理(NLP)最近在网络安全方面引起了广泛关注,尤其是网络威胁智能(CTI)和网络自动化。联系和自动化的增加彻底改变了世界的经济和文化基础设施,而它们在网络攻击方面引入了风险。 CTI是帮助网络安全分析师做出智能安全决策的信息,通常以自然语言文本的形式提供,必须通过自动化过程将其转换为机器可读格式,然后才能将其用于自动化安全措施。 本文提出了Securebert,这是一个网络安全语言模型,能够捕获网络安全文本中的文本内涵(例如CTI),因此成功地完成了许多关键网络安全任务的自动化,否则这些任务将依靠人类的专业知识和耗时的手动努力。 Securebert已接受了大量网络安全文本的培训。为了使Securebert不仅有效地保留英语的理解,而且在应用于具有网络安全含义的文本时,我们开发了一种自定义的令牌,以及一种改变预训练的权重的方法。使用标准蒙版语言模型(MLM)测试以及两个其他标准NLP任务评估Securebert。我们的评估研究表明,Securebert \ footNote {\ url {https://github.com/ehsanaghaei/securebert}}均超过了现有的类似模型,从而确认其在网络安全中求解至关重要的NLP任务。

Natural Language Processing (NLP) has recently gained wide attention in cybersecurity, particularly in Cyber Threat Intelligence (CTI) and cyber automation. Increased connection and automation have revolutionized the world's economic and cultural infrastructures, while they have introduced risks in terms of cyber attacks. CTI is information that helps cybersecurity analysts make intelligent security decisions, that is often delivered in the form of natural language text, which must be transformed to machine readable format through an automated procedure before it can be used for automated security measures. This paper proposes SecureBERT, a cybersecurity language model capable of capturing text connotations in cybersecurity text (e.g., CTI) and therefore successful in automation for many critical cybersecurity tasks that would otherwise rely on human expertise and time-consuming manual efforts. SecureBERT has been trained using a large corpus of cybersecurity text.To make SecureBERT effective not just in retaining general English understanding, but also when applied to text with cybersecurity implications, we developed a customized tokenizer as well as a method to alter pre-trained weights. The SecureBERT is evaluated using the standard Masked Language Model (MLM) test as well as two additional standard NLP tasks. Our evaluation studies show that SecureBERT\footnote{\url{https://github.com/ehsanaghaei/SecureBERT}} outperforms existing similar models, confirming its capability for solving crucial NLP tasks in cybersecurity.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源