论文标题

使用来自要求和代码的文本结构中的共识性比特人来改善基于IR的可追溯性恢复

Using Consensual Biterms from Text Structures of Requirements and Code to Improve IR-Based Traceability Recovery

论文作者

Gao, Hui, Kuang, Hongyu, Sun, Kexin, Ma, Xiaoxing, Egyed, Alexander, Mäder, Patrick, Rong, Guoping, Shao, Dong, Zhang, He

论文摘要

根据系统功能是否相关,可追溯性批准软件工件之间的跟踪链接。这些痕迹对于软件开发很有价值,但很难手动获得。为了应对昂贵且易犯错误的手动恢复,提出了自动化方法,以通过软件工件之间的文本相似性(例如基于信息检索(IR))恢复痕迹。但是,伪像文本的低质量和数量对计算的IR值产生了负面影响,从而极大地阻碍了基于IR的方法的性能。在这项研究中,我们建议从需求和代码的文本结构(即同意的比特人)中提取同时发生的单词对,以改善基于IR的可追溯性恢复。我们首先根据要求文本的语音部分收集一组Biterms,然后通过代码文本过滤它们。然后,我们使用这些共识的比特人既丰富了IR技术的输入语料库,又增强了IR值的计算。一个基于九个系统的评估表明,通常,当仅用于增强IR技术时,我们的方法可以优于基于纯的IR方法,而AP的方法分别超过21.9%和21.8%,而MAP中的方法分别为9.3%和7.2%。此外,当用不同角度与另一种增强策略合作时,它可以优于AP的基线5.9%,而MAP中的基线则可以优于4.8%。

Traceability approves trace links among software artifacts based on whether two artifacts are related by system functionalities. The traces are valuable for software development, but are difficult to obtain manually. To cope with the costly and fallible manual recovery, automated approaches are proposed to recover traces through textual similarities among software artifacts, such as those based on Information Retrieval (IR). However, the low quality & quantity of artifact texts negatively impact the calculated IR values, thus greatly hindering the performance of IR-based approaches. In this study, we propose to extract co-occurred word pairs from the text structures of both requirements and code (i.e., consensual biterms) to improve IR-based traceability recovery. We first collect a set of biterms based on the part-of-speech of requirement texts, and then filter them through the code texts. We then use these consensual biterms to both enrich the input corpus for IR techniques and enhance the calculations of IR values. A nine-system-based evaluation shows that in general, when solely used to enhance IR techniques, our approach can outperform pure IR-based approaches and another baseline by 21.9% & 21.8% in AP, and 9.3% & 7.2% in MAP, respectively. Moreover, when used to collaborate with another enhancing strategy from different perspectives, it can outperform this baseline by 5.9% in AP and 4.8% in MAP.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源