论文标题
用于代表Python程序作为机器学习的图表的库
A Library for Representing Python Programs as Graphs for Machine Learning
论文作者
论文摘要
程序的图表通常是用于代码研究的机器学习的核心要素。我们介绍了一个开源Python库Python_graphs,该图表应用静态分析来构建适合培训机器学习模型的Python程序的图表。我们的图书馆承认控制流图,数据流图和复合``程序图''的构建,这些图形结合了有关程序的控制流,数据流,句法和词汇信息。我们介绍了图书馆的功能和局限性,进行案例研究,将图书馆应用于数百万竞争性的编程提交,并展示图书馆用于机器学习研究的实用程序。
Graph representations of programs are commonly a central element of machine learning for code research. We introduce an open source Python library python_graphs that applies static analysis to construct graph representations of Python programs suitable for training machine learning models. Our library admits the construction of control-flow graphs, data-flow graphs, and composite ``program graphs'' that combine control-flow, data-flow, syntactic, and lexical information about a program. We present the capabilities and limitations of the library, perform a case study applying the library to millions of competitive programming submissions, and showcase the library's utility for machine learning research.