在代码上使用大规模的异常检测来改进Kotlin编译器

论文标题

在代码上使用大规模的异常检测来改进Kotlin编译器

Using Large-Scale Anomaly Detection on Code to Improve Kotlin Compiler

论文作者

Bryksin, Timofey, Petukhov, Victor, Alexin, Ilya, Prikhodko, Stanislav, Shpilman, Alexey, Kovalenko, Vladimir, Povarov, Nikita

论文摘要

在这项工作中，我们将异常检测应用于源代码和字节码，以促进编程语言及其编译器的开发。我们将异常定义为代码片段，与用特定编程语言编写的典型代码不同。识别此类代码片段对语言开发人员和最终用户都是有益的，因为异常可能会指示编译器或运行时性能的潜在问题。此外，异常可能对应于语言设计中的问题。对于这项研究，我们选择Kotlin作为目标编程语言。我们概述并讨论获得源代码和字节码的向量表示以及在矢量化代码段中检测异常的方法。该论文提出了一种旨在检测两种异常的方法：语法树异常和所谓的编译器诱导的异常，仅在编译字体上出现。我们描述了一些实验，这些实验采用了不同的矢量化和异常检测技术组合，并讨论了检测到的异常类型及其对语言开发人员的有用性。我们证明了提取的异常和基础提取技术为语言发展提供了更多价值。

In this work, we apply anomaly detection to source code and bytecode to facilitate the development of a programming language and its compiler. We define anomaly as a code fragment that is different from typical code written in a particular programming language. Identifying such code fragments is beneficial to both language developers and end users, since anomalies may indicate potential issues with the compiler or with runtime performance. Moreover, anomalies could correspond to problems in language design. For this study, we choose Kotlin as the target programming language. We outline and discuss approaches to obtaining vector representations of source code and bytecode and to the detection of anomalies across vectorized code snippets. The paper presents a method that aims to detect two types of anomalies: syntax tree anomalies and so-called compiler-induced anomalies that arise only in the compiled bytecode. We describe several experiments that employ different combinations of vectorization and anomaly detection techniques and discuss types of detected anomalies and their usefulness for language developers. We demonstrate that the extracted anomalies and the underlying extraction technique provide additional value for language development.

下载PDF全文

下载文献需遵守相关版权规定

论文标题