论文标题

基于依赖关系的神经表示,用于分类程序行

Dependency-Based Neural Representations for Classifying Lines of Programs

论文作者

Srikant, Shashank, Lesimple, Nicolas, O'Reilly, Una-May

论文摘要

我们研究了将程序行分类为包含漏洞或不使用机器学习的问题。这样的线路级分类任务需要一个程序表示,这超出了行中存在的代币的推理。我们在潜在特征空间中寻求分布式表示形式,该表示可以捕获出现在程序线上的代币的控制和数据依赖性,同时还可以确保具有相似含义的行具有相似的特征。我们提出了一种神经体系结构,即Vulcan,成功地证明了这两个要求。它将有关令牌的上下文信息提取在一条线中,并将其作为抽象语法树(AST)路径输入,并带有带有注意机制的双向LSTM。它通过递归地嵌入了最近定义的线,同时表示令牌中令牌的含义。在我们的实验中,Vulcan与最先进的分类器相比,该分类器需要对程序进行大量预处理,这表明使用深度学习来对程序依赖信息进行建模。

We investigate the problem of classifying a line of program as containing a vulnerability or not using machine learning. Such a line-level classification task calls for a program representation which goes beyond reasoning from the tokens present in the line. We seek a distributed representation in a latent feature space which can capture the control and data dependencies of tokens appearing on a line of program, while also ensuring lines of similar meaning have similar features. We present a neural architecture, Vulcan, that successfully demonstrates both these requirements. It extracts contextual information about tokens in a line and inputs them as Abstract Syntax Tree (AST) paths to a bi-directional LSTM with an attention mechanism. It concurrently represents the meanings of tokens in a line by recursively embedding the lines where they are most recently defined. In our experiments, Vulcan compares favorably with a state-of-the-art classifier, which requires significant preprocessing of programs, suggesting the utility of using deep learning to model program dependence information.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源