论文标题

使用机器学习在问题跟踪系统中确定自我吸引的技术债务

Identifying Self-Admitted Technical Debt in Issue Tracking Systems using Machine Learning

论文作者

Li, Yikun, Soliman, Mohamed, Avgeriou, Paris

论文摘要

技术债务是一种隐喻,表明通过牺牲软件的长期可维护性和可转化性来实现为短期福利实施的次优质解决方案。软件工程师明确承认了一种特殊的技术债务(例如,使用待办事项);这称为自我辅助技术债务或SATD。自动识别SATD的大多数工作都集中在源代码注释上。除了源代码注释外,问题跟踪系统还显示为SATD的另一个丰富来源,但是没有专门用于自动在问题中识别SATD的方法。在本文中,我们首先通过从七个开源项目(即骆驼,铬,Gerrit,Hadoop,Hbase,hbase,impala和Thrift)(即使用两个普遍的问题跟踪系统(即骆驼,Hadoop),jira和google monorail)中的七个开源项目(即骆驼,铬,Gerrit,Hadoop,hbase,impala和Thr)收集和手动分析4200个问题(分解至23,180个问题)。然后,我们建议并优化一种使用机器学习在问题跟踪系统中自动识别SATD的方法。我们的发现表明:1)关于F1得分,我们的方法以广泛的差距超过基线方法; 2)从合适的数据集中转移知识可以提高我们方法的预测性能; 3)提取的SATD关键字是直观的,并且可能指示SATD的类型和指标; 4)与使用相同问题跟踪系统的项目相比,使用不同问题跟踪系统的项目的SATD关键字较少; 5)需要少量的培训数据来实现良好的准确性。

Technical debt is a metaphor indicating sub-optimal solutions implemented for short-term benefits by sacrificing the long-term maintainability and evolvability of software. A special type of technical debt is explicitly admitted by software engineers (e.g. using a TODO comment); this is called Self-Admitted Technical Debt or SATD. Most work on automatically identifying SATD focuses on source code comments. In addition to source code comments, issue tracking systems have shown to be another rich source of SATD, but there are no approaches specifically for automatically identifying SATD in issues. In this paper, we first create a training dataset by collecting and manually analyzing 4,200 issues (that break down to 23,180 sections of issues) from seven open-source projects (i.e., Camel, Chromium, Gerrit, Hadoop, HBase, Impala, and Thrift) using two popular issue tracking systems (i.e., Jira and Google Monorail). We then propose and optimize an approach for automatically identifying SATD in issue tracking systems using machine learning. Our findings indicate that: 1) our approach outperforms baseline approaches by a wide margin with regard to the F1-score; 2) transferring knowledge from suitable datasets can improve the predictive performance of our approach; 3) extracted SATD keywords are intuitive and potentially indicating types and indicators of SATD; 4) projects using different issue tracking systems have less common SATD keywords compared to projects using the same issue tracking system; 5) a small amount of training data is needed to achieve good accuracy.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源