通过从协议规范文档中提取有限状态机器来自动攻击合成

论文标题

通过从协议规范文档中提取有限状态机器来自动攻击合成

Automated Attack Synthesis by Extracting Finite State Machines from Protocol Specification Documents

论文作者

Pacheco, Maria Leonor, von Hippel, Max, Weintraub, Ben, Goldwasser, Dan, Nita-Rotaru, Cristina

论文摘要

自动攻击发现技术（例如攻击者综合或基于模型的模糊）提供了确保网络协议正确，安全运行的有力方法。通常，这种技术通常需要以有限状态机（FSM）的形式对协议进行正式表示。不幸的是，许多协议仅在英语散文中描述，甚至将简单的网络协议作为FSM进行了耗时，并且容易出现微妙的逻辑错误。从文档中自动提取协议FSM可以显着有助于增加这些技术的使用，并导致更健壮和安全的协议实现。在这项工作中，我们专注于攻击者合成作为协议安全性的代表性技术，而RFC作为协议散文描述的代表格式。与其他依靠基于规则的方法或直接使用现成的NLP工具的作品不同，我们建议一种数据驱动的方法来从RFC文档中提取FSM。具体而言，我们使用由三个关键步骤组成的混合方法：（1）技术语言的大规模单词代表学习，（2）将重点的零局部学习用于将协议文本映射到不依赖协议的信息语言，以及（3）基于规则的映射从协议独立于独立的信息到特定协议FSM。我们通过将RFC用于六种不同的协议：BGPV4，DCCP，LTP，PPTP，SCTP和TCP来显示FSM提取的普遍性。我们证明了如何将FSM自动提取从RFC中使用，将TCP和DCCP作为病例研究，将FSM自动提取。我们的方法表明，可以使用诸如RFC之类的文本规范来自动化攻击者合成协议。

Automated attack discovery techniques, such as attacker synthesis or model-based fuzzing, provide powerful ways to ensure network protocols operate correctly and securely. Such techniques, in general, require a formal representation of the protocol, often in the form of a finite state machine (FSM). Unfortunately, many protocols are only described in English prose, and implementing even a simple network protocol as an FSM is time-consuming and prone to subtle logical errors. Automatically extracting protocol FSMs from documentation can significantly contribute to increased use of these techniques and result in more robust and secure protocol implementations. In this work we focus on attacker synthesis as a representative technique for protocol security, and on RFCs as a representative format for protocol prose description. Unlike other works that rely on rule-based approaches or use off-the-shelf NLP tools directly, we suggest a data-driven approach for extracting FSMs from RFC documents. Specifically, we use a hybrid approach consisting of three key steps: (1) large-scale word-representation learning for technical language, (2) focused zero-shot learning for mapping protocol text to a protocol-independent information language, and (3) rule-based mapping from protocol-independent information to a specific protocol FSM. We show the generalizability of our FSM extraction by using the RFCs for six different protocols: BGPv4, DCCP, LTP, PPTP, SCTP and TCP. We demonstrate how automated extraction of an FSM from an RFC can be applied to the synthesis of attacks, with TCP and DCCP as case-studies. Our approach shows that it is possible to automate attacker synthesis against protocols by using textual specifications such as RFCs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题