论文标题
Neudep:神经二元记忆依赖分析
NeuDep: Neural Binary Memory Dependence Analysis
论文作者
论文摘要
确定多个指令是否可以访问相同的内存位置是二进制分析中的关键任务。这是具有挑战性的,因为从理论上讲,静态计算精确的别名信息是无法确定的。由于编译器的优化以及不存在符号和类型,问题在二进制级别加剧。现有的方法要么由于保守分析而产生大量的伪造依赖性,要么对复杂的二进制方法缩小范围很差。 我们提出了一种新的基于机器学习的方法,可以通过利用模型对二进制程序的执行知识来预测内存依赖性。我们的方法功能(i)一个自我监督的过程,该程序可以预告神经网,以通过二进制代码及其动态价值通过内存地址进行推理,然后是(ii)监督的登录以静态地推断内存依赖性。为了促进有效的学习,我们开发了专用的神经体系结构,以编码具有特定模块的异质输入(即来自痕迹的代码,数据值和内存地址),并将其与组成学习策略融合在一起。 我们在Neudep中实施了我们的方法,并将其评估为由2个编译器,4个优化和4个浮标通行证编写的41个流行软件项目。我们证明,Neudep比当前的最新面积更精确(1.5倍),更快(3.5倍)。对关键安全性逆向工程任务的广泛探测研究表明,Neudep了解内存访问模式,学习功能签名并能够匹配间接调用。所有这些任务都可以帮助推断内存依赖性。值得注意的是,Neudep还胜过这些任务的当前最新技术。
Determining whether multiple instructions can access the same memory location is a critical task in binary analysis. It is challenging as statically computing precise alias information is undecidable in theory. The problem aggravates at the binary level due to the presence of compiler optimizations and the absence of symbols and types. Existing approaches either produce significant spurious dependencies due to conservative analysis or scale poorly to complex binaries. We present a new machine-learning-based approach to predict memory dependencies by exploiting the model's learned knowledge about how binary programs execute. Our approach features (i) a self-supervised procedure that pretrains a neural net to reason over binary code and its dynamic value flows through memory addresses, followed by (ii) supervised finetuning to infer the memory dependencies statically. To facilitate efficient learning, we develop dedicated neural architectures to encode the heterogeneous inputs (i.e., code, data values, and memory addresses from traces) with specific modules and fuse them with a composition learning strategy. We implement our approach in NeuDep and evaluate it on 41 popular software projects compiled by 2 compilers, 4 optimizations, and 4 obfuscation passes. We demonstrate that NeuDep is more precise (1.5x) and faster (3.5x) than the current state-of-the-art. Extensive probing studies on security-critical reverse engineering tasks suggest that NeuDep understands memory access patterns, learns function signatures, and is able to match indirect calls. All these tasks either assist or benefit from inferring memory dependencies. Notably, NeuDep also outperforms the current state-of-the-art on these tasks.