论文标题

语义克隆通过概率软件建模检测

Semantic Clone Detection via Probabilistic Software Modeling

论文作者

Thaller, Hannes, Linsbauer, Lukas, Egyed, Alexander

论文摘要

语义克隆检测是找到具有相似或相等运行时行为的程序元素的过程。例如,检测阶乘计算的递归和迭代实现之间的语义平等。语义克隆检测是克隆探测器的事实上的技术边界。近年来,使用有趣的新方法对该边界进行了测试。本文贡献了一种语义克隆检测方法,该方法检测具有0%句法相似性的克隆。我们通过概率软件建模(SCD-PSM)表示语义克隆检测,作为语义克隆检测的稳定且精确的解决方案。 PSM构建了能够评估和生成运行时数据的程序的概率模型。 SCD-PSM利用此模型及其模型元素来查找行为相等的模型元素。然后将这种行为平等推广到原始程序元素的语义平等。它使用模型元素之间的可能性作为距离度量。然后,考虑到预先指定且可控的假阳性速率,它采用了可能性比显着性测试来决定该距离是否显着。 SCD-PSM的输出是对程序元素(即方法),它们的距离以及对它们是否是克隆的决定。 MATTHEWS相关系数大于0.9,SCD-PSM产生了出色的结果。这些结果是在经典语义克隆检测问题上获得的,例如检测算法的递归和迭代版本,也可以在编码竞争中使用的复杂问题。

Semantic clone detection is the process of finding program elements with similar or equal runtime behavior. For example, detecting the semantic equality between the recursive and iterative implementation of the factorial computation. Semantic clone detection is the de facto technical boundary of clone detectors. In recent years, this boundary has been tested using interesting new approaches. This article contributes a semantic clone detection approach that detects clones that have 0% syntactic similarity. We present Semantic Clone Detection via Probabilistic Software Modeling (SCD-PSM) as a stable and precise solution to semantic clone detection. PSM builds a probabilistic model of a program that is capable of evaluating and generating runtime data. SCD-PSM leverages this model and its model elements for finding behaviorally equal model elements. This behavioral equality is then generalized to semantic equality of the original program elements. It uses the likelihood between model elements as a distance metric. Then, it employs the likelihood ratio significance test to decide whether this distance is significant, given a pre-specified and controllable false-positive rate. The output of SCD-PSM are pairs of program elements (i.e., methods), their distance, and a decision on whether they are clones or not. SCD-PSM yields excellent results with a Matthews Correlation Coefficient greater than 0.9. These results are obtained on classical semantic clone detection problems such as detecting recursive and iterative versions of an algorithm, but also on complex problems used in coding competitions.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源