论文标题
关于在软件缺陷预测中使用深度学习
On the Use of Deep Learning in Software Defect Prediction
论文作者
论文摘要
上下文:通常使用机器学习(ML)技术,越来越多地应用了自动软件缺陷预测(SDP)方法。但是,现有的基于ML的方法需要手动提取的功能,这些功能繁琐,耗时,几乎无法捕获错误报告工具中报告的语义信息。深度学习(DL)技术为从业者提供了自动提取和学习更复杂和高维数据的机会。目的:这项研究的目的是系统地识别,分析,总结和综合文献中DL算法利用的当前状态。方法:我们系统地选择了102个同行评审研究的池,然后使用从这些研究中提取的数据进行了定量和定性分析。结果:主要亮点包括:(1)大多数研究应用的监督DL; (2)三分之二的研究使用指标作为DL算法的输入; (3)卷积神经网络是最常用的DL算法。结论:根据我们的发现,我们建议(1)开发自动捕获所需功能的更全面的DL方法; (2)使用源代码以外的其他软件工件; (3)采用数据增强技术来解决类不平衡问题; (4)发布复制软件包。
Context: Automated software defect prediction (SDP) methods are increasingly applied, often with the use of machine learning (ML) techniques. Yet, the existing ML-based approaches require manually extracted features, which are cumbersome, time consuming and hardly capture the semantic information reported in bug reporting tools. Deep learning (DL) techniques provide practitioners with the opportunities to automatically extract and learn from more complex and high-dimensional data. Objective: The purpose of this study is to systematically identify, analyze, summarize, and synthesize the current state of the utilization of DL algorithms for SDP in the literature. Method: We systematically selected a pool of 102 peer-reviewed studies and then conducted a quantitative and qualitative analysis using the data extracted from these studies. Results: Main highlights include: (1) most studies applied supervised DL; (2) two third of the studies used metrics as an input to DL algorithms; (3) Convolutional Neural Network is the most frequently used DL algorithm. Conclusion: Based on our findings, we propose to (1) develop more comprehensive DL approaches that automatically capture the needed features; (2) use diverse software artifacts other than source code; (3) adopt data augmentation techniques to tackle the class imbalance problem; (4) publish replication packages.