自我APR：通过测试执行诊断的自我监督程序维修

论文标题

自我APR：通过测试执行诊断的自我监督程序维修

SelfAPR: Self-supervised Program Repair with Test Execution Diagnostics

论文作者

Ye, He, Martinez, Matias, Luo, Xiapu, Zhang, Tao, Monperrus, Martin

论文摘要

基于学习的计划维修在最近的一系列论文中取得了良好的成果。然而，我们观察到，由于缺乏有关修复程序的应用程序域的知识，相关工作无法修复某些错误，以及2）正在修复的故障类型。在本文中，我们通过将学习范式从监督培训更改为一种称为自助式的方法来解决这两个问题。首先，SelfAPR通过扰动正在修复的程序的先前版本，从而在磁盘上生成培训样本，从而强制执行神经模型以捕获项目规定的知识。这与基于过去的提交的以前的工作不同。其次，SelfAPR执行所有训练样本，提取和编码测试执行诊断到输入表示形式，从而指导神经模型来解决这种故障。这与仅将静态源代码视为输入的现有研究不同。我们实施自我APR并以系统的方式进行评估。我们生成1 039 873通过扰动17个开源项目获得的培训样本。我们评估了来自缺陷4J的818个错误的自我APR，SelfAPR正确维修了110个错误，表现优于所有监督学习维修方法。

Learning-based program repair has achieved good results in a recent series of papers. Yet, we observe that the related work fails to repair some bugs because of a lack of knowledge about 1) the application domain of the program being repaired, and 2) the fault type being repaired. In this paper, we solve both problems by changing the learning paradigm from supervised training to self-supervised training in an approach called SelfAPR. First, SelfAPR generates training samples on disk by perturbing a previous version of the program being repaired, enforcing the neural model to capture projectspecific knowledge. This is different from the previous work based on mined past commits. Second, SelfAPR executes all training samples and extracts and encodes test execution diagnostics into the input representation, steering the neural model to fix the kind of fault. This is different from the existing studies that only consider static source code as input. We implement SelfAPR and evaluate it in a systematic manner. We generate 1 039 873 training samples obtained by perturbing 17 open-source projects. We evaluate SelfAPR on 818 bugs from Defects4J, SelfAPR correctly repairs 110 of them, outperforming all the supervised learning repair approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题