论文标题
BILO-CPDP:用于跨项目缺陷预测中自动化模型发现的BI级编程
BiLO-CPDP: Bi-Level Programming for Automated Model Discovery in Cross-Project Defect Prediction
论文作者
论文摘要
通过将转移学习者与分类器相结合来借用类似项目的数据的跨项目缺陷预测(CPDP)已成为一种有希望的方法,可以预测软件缺陷,而当有关目标项目的可用数据不足时,可以预测软件缺陷。但是,开发这种模型是挑战,因为很难确定转移学习者和分类器的正确组合以及其最佳的超参数设置。在本文中,我们提出了一种燃烧的工具,该工具是第一个从BI级编程的角度来制定自动化CPDP模型发现的同类工具。特别是,双层编程以层次方式以两个嵌套级别进行优化。具体而言,高级优化常规旨在搜索转移学习者和分类器的正确组合,而嵌套的下层优化常规旨在优化相应的超参数设置。用于评估bilo-cpdp,我们在20个项目上进行了20个项目,以与21个现有的CPDP技术相比,我们进行了20个项目,并将其进行优化。最先进的自动化机器学习工具。经验结果表明,比洛-CPDP冠军比其他21种项目的现有21种CPDP技术更好,同时在所有情况下都非常优于自动 - 漏水及其单层优化变体。此外,独特的双层正式化Inbilo-CPDP还允许将更多的预算分配给高层,这大大提高了性能。
Cross-Project Defect Prediction (CPDP), which borrows data from similar projects by combining a transfer learner with a classifier, have emerged as a promising way to predict software defects when the available data about the target project is insufficient. How-ever, developing such a model is challenge because it is difficult to determine the right combination of transfer learner and classifier along with their optimal hyper-parameter settings. In this paper, we propose a tool, dubbedBiLO-CPDP, which is the first of its kind to formulate the automated CPDP model discovery from the perspective of bi-level programming. In particular, the bi-level programming proceeds the optimization with two nested levels in a hierarchical manner. Specifically, the upper-level optimization routine is designed to search for the right combination of transfer learner and classifier while the nested lower-level optimization routine aims to optimize the corresponding hyper-parameter settings.To evaluateBiLO-CPDP, we conduct experiments on 20 projects to compare it with a total of 21 existing CPDP techniques, along with its single-level optimization variant and Auto-Sklearn, a state-of-the-art automated machine learning tool. Empirical results show that BiLO-CPDP champions better prediction performance than all other 21 existing CPDP techniques on 70% of the projects, while be-ing overwhelmingly superior to Auto-Sklearn and its single-level optimization variant on all cases. Furthermore, the unique bi-level formalization inBiLO-CPDP also permits to allocate more budget to the upper-level, which significantly boosts the performance.