论文标题

修补为翻译:数据和隐喻

Patching as Translation: the Data and the Metaphor

论文作者

Ding, Yangruibo, Ray, Baishakhi, Devanbu, Premkumar, Hellendoorn, Vincent J.

论文摘要

来自其他领域的机器学习模型,例如计算语言学,已被移植到软件工程任务,通常是非常成功的。然而,移植模型在给定任务上的初始成功并不一定意味着它非常适合该任务。在这项工作中,我们研究了这种现象的常见示例:“软件修补就像语言翻译”的想法。我们从经验上证明,序列到序列模型和翻译模型之间存在微妙但批判性的区别:虽然程序维修受益于以前的一般建模体系结构,但实际上它遭受了后者内置的设计决策,无论是在翻译准确性和多样性方面。鉴于这些发现,我们证明了基于我们的经验发现和软件开发的一般知识的更有原则的模型设计方法如何导致更好的解决方案。我们的发现还为在货物上下文中综合代码编辑的最新趋势提供了强有力的支持,以修复错误。我们将自己的模型作为“概念证明”工具实施,并从经验上确认它们的行为与基于翻译的架构的根本不同,更有效的方式。总体而言,我们的结果证明了研究机器学习模型在软件工程中的复杂性的优点:不仅可以帮助阐明可能因准确性提高而掩盖的潜在问题;它还可以帮助创新这些模型,以进一步提高最先进的方式。我们将在https://github.com/arise-lab/patch-as-translation上公开发布复制数据和材料。

Machine Learning models from other fields, like Computational Linguistics, have been transplanted to Software Engineering tasks, often quite successfully. Yet a transplanted model's initial success at a given task does not necessarily mean it is well-suited for the task. In this work, we examine a common example of this phenomenon: the conceit that "software patching is like language translation". We demonstrate empirically that there are subtle, but critical distinctions between sequence-to-sequence models and translation model: while program repair benefits greatly from the former, general modeling architecture, it actually suffers from design decisions built into the latter, both in terms of translation accuracy and diversity. Given these findings, we demonstrate how a more principled approach to model design, based on our empirical findings and general knowledge of software development, can lead to better solutions. Our findings also lend strong support to the recent trend towards synthesizing edits of code conditional on the buggy context, to repair bugs. We implement such models ourselves as "proof-of-concept" tools and empirically confirm that they behave in a fundamentally different, more effective way than the studied translation-based architectures. Overall, our results demonstrate the merit of studying the intricacies of machine learned models in software engineering: not only can this help elucidate potential issues that may be overshadowed by increases in accuracy; it can also help innovate on these models to raise the state-of-the-art further. We will publicly release our replication data and materials at https://github.com/ARiSE-Lab/Patch-as-translation.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源