论文标题

在评估基于NLP的软件工程模型

On the Evaluation of NLP-based Models for Software Engineering

论文作者

Izadi, Maliheh, Ahmadabadi, Matin Nili

论文摘要

基于NLP的模型已越来越多地合并以解决SE问题。这些模型要么在SE域中使用,而且几乎没有变化,要么是针对源代码及其独特特征的量身定制的。这些方法中的许多方法被认为胜过现有解决方案。但是,这里出现了一个重要的问题:“这些模型在SE社区中是否得到了公平,一致的评估?”。为了回答这个问题,我们回顾了研究人员如何评估基于NLP的SE问题模型。研究结果表明,目前没有一致且广泛认可的协议来评估这些模型。尽管在不同的研究中评估了同一任务的不同方面,但指标是根据自定义选择而不是系统定义的,最后是根据情况收集和解释答案。因此,迫切需要提供一种方法论方法来评估基于NLP的模型以进行一致的评估并保留公平有效的比较的可能性。

NLP-based models have been increasingly incorporated to address SE problems. These models are either employed in the SE domain with little to no change, or they are greatly tailored to source code and its unique characteristics. Many of these approaches are considered to be outperforming or complementing existing solutions. However, an important question arises here: "Are these models evaluated fairly and consistently in the SE community?". To answer this question, we reviewed how NLP-based models for SE problems are being evaluated by researchers. The findings indicate that currently there is no consistent and widely-accepted protocol for the evaluation of these models. While different aspects of the same task are being assessed in different studies, metrics are defined based on custom choices, rather than a system, and finally, answers are collected and interpreted case by case. Consequently, there is a dire need to provide a methodological way of evaluating NLP-based models to have a consistent assessment and preserve the possibility of fair and efficient comparison.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源