论文标题

针对软件缺陷预测的机器学习算法的完整性

The Integrity of Machine Learning Algorithms against Software Defect Prediction

论文作者

and, Param Khakhar, Dubey, Rahul Kumar

论文摘要

近年来,计算机化的增加导致了各种不同软件的生产,但是需要采取措施来确保生产的软件不会有缺陷。许多研究人员在这一领域工作,并开发了不同的基于机器学习的方法,以预测该软件是否有缺陷。不能仅通过使用不同的常规分类器来解决此问题,因为数据集高度不平衡,即检测到的有缺陷样本的数量与非缺陷样本的数量相比,数据集的数量较小。因此,为了解决此问题,需要某些复杂的方法。研究人员开发的不同方法可以广泛地分为基于重新采样的方法,基于成本敏感的学习方法和集合学习。在这些方法中。该报告分析了Liang等人提出的在线顺序极端学习机(OS-ELM)的性能。在对数据进行过采样后,针对多个分类器,例如逻辑回归,支持向量机,随机森林和幼稚的贝叶。 OS-ELM训练的速度比传统的深神经网络更快,并且始终会收敛到全球最佳解决方案。在原始数据集以及过度采样的数据集上进行了比较。使用的过采样技术是基于噪声过滤的基于群集的过度采样。该技术比几种用于过采样的最新技术要好。该分析是对NASA集团进行的3个项目KC1,PC4和PC3进行的。用于测量的指标是召回和平衡的准确性。在两种情况下,与其他分类器相比,OS-ELM的结果更高。

The increased computerization in recent years has resulted in the production of a variety of different software, however measures need to be taken to ensure that the produced software isn't defective. Many researchers have worked in this area and have developed different Machine Learning-based approaches that predict whether the software is defective or not. This issue can't be resolved simply by using different conventional classifiers because the dataset is highly imbalanced i.e the number of defective samples detected is extremely less as compared to the number of non-defective samples. Therefore, to address this issue, certain sophisticated methods are required. The different methods developed by the researchers can be broadly classified into Resampling based methods, Cost-sensitive learning-based methods, and Ensemble Learning. Among these methods. This report analyses the performance of the Online Sequential Extreme Learning Machine (OS-ELM) proposed by Liang et.al. against several classifiers such as Logistic Regression, Support Vector Machine, Random Forest, and Naïve Bayes after oversampling the data. OS-ELM trains faster than conventional deep neural networks and it always converges to the globally optimal solution. A comparison is performed on the original dataset as well as the over-sampled data set. The oversampling technique used is Cluster-based Over-Sampling with Noise Filtering. This technique is better than several state-of-the-art techniques for oversampling. The analysis is carried out on 3 projects KC1, PC4 and PC3 carried out by the NASA group. The metrics used for measurement are recall and balanced accuracy. The results are higher for OS-ELM as compared to other classifiers in both scenarios.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源