论文标题

我们可以从开发人员错误中学习吗?学习从实际错误修复中本地化和修复实际错误

Can we learn from developer mistakes? Learning to localize and repair real bugs from real bug fixes

论文作者

Richter, Cedric, Wehrheim, Heike

论文摘要

开源存储库中发现的真正错误修复似乎是学习本地化和修复实际错误的理想来源。但是,缺乏大规模的错误修复收集使过去难以有效利用实际的错误修复,在过去训练大型神经模型中。相比之下,人工错误 - 通过突变现有源代码产生的人造错误可以很容易地以足够的规模获得,因此在培训现有方法时通常是优选的。尽管如此,在面对真正的错误时,经过对人造错误的培训的本地化和维修模型通常表现不佳。这就提出了一个问题,是否在实际错误修复程序上培训的错误本地化和维修模型在本地化和维修实际错误方面更有效。 我们通过介绍Realit,这是一种预先培训的方法,以有效地学习从真正的错误修复中进行本地化和修复实际错误,以解决这个问题。 Realit首先是在传统突变操作员产生的大量人造错误上进行训练,然后在较小的一组实际错误修复程序上进行了微调。微调不需要对学习算法进行任何修改,因此可以轻松地在各种培训方案中用于错误定位或维修(即使实际培训数据很少)。此外,我们发现,对使用真实错误修复的培训几乎使现有模型在实际错误上的本地化性能增加一倍,同时维护甚至改善维修性能,从而在经验上具有强大的作用。

Real bug fixes found in open source repositories seem to be the perfect source for learning to localize and repair real bugs. However, the absence of large scale bug fix collections has made it difficult to effectively exploit real bug fixes in the training of larger neural models in the past. In contrast, artificial bugs -- produced by mutating existing source code -- can be easily obtained at a sufficient scale and are therefore often preferred in the training of existing approaches. Still, localization and repair models that are trained on artificial bugs usually underperform when faced with real bugs. This raises the question whether bug localization and repair models trained on real bug fixes are more effective in localizing and repairing real bugs. We address this question by introducing RealiT, a pre-train-and-fine-tune approach for effectively learning to localize and repair real bugs from real bug fixes. RealiT is first pre-trained on a large number of artificial bugs produced by traditional mutation operators and then fine-tuned on a smaller set of real bug fixes. Fine-tuning does not require any modifications of the learning algorithm and hence can be easily adopted in various training scenarios for bug localization or repair (even when real training data is scarce). In addition, we found that training on real bug fixes with RealiT is empirically powerful by nearly doubling the localization performance of an existing model on real bugs while maintaining or even improving the repair performance.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源