论文标题
帐篷Lévy飞行麻雀搜索算法用于特征选择:COVID-19案例研究
A Tent Lévy Flying Sparrow Search Algorithm for Feature Selection: A COVID-19 Case Study
论文作者
论文摘要
信息科学的快速发展引起的“维度诅咒”在处理大数据集时可能会产生负面影响。在本文中,我们提出了一种称为TentLévy飞行麻雀搜索算法(TFSSA)的Sparrow搜索算法(SSA)的变体,并使用它来选择包装模式中最佳特征子集以进行分类。 SSA是最近提出的算法,尚未系统地应用于特征选择问题。通过CEC2020基准函数验证后,TFSSA用于选择最佳功能组合,以最大化分类精度并最大程度地减少所选功能的数量。将提出的TFSSA与文献中的九种算法进行了比较。 9个评估指标用于正确评估和比较UCI存储库中二十一个数据集上这些算法的性能。此外,该方法应用于冠状病毒病(Covid-19)数据集,分别获得最佳的平均分类精度和特征选择的平均数量,分别为93.47%和2.1。实验结果证实了所提出的算法在提高分类准确性和减少与其他基于包装器的算法相比的选定特征数量方面的优势。
The "Curse of Dimensionality" induced by the rapid development of information science, might have a negative impact when dealing with big datasets. In this paper, we propose a variant of the sparrow search algorithm (SSA), called Tent Lévy flying sparrow search algorithm (TFSSA), and use it to select the best subset of features in the packing pattern for classification purposes. SSA is a recently proposed algorithm that has not been systematically applied to feature selection problems. After verification by the CEC2020 benchmark function, TFSSA is used to select the best feature combination to maximize classification accuracy and minimize the number of selected features. The proposed TFSSA is compared with nine algorithms in the literature. Nine evaluation metrics are used to properly evaluate and compare the performance of these algorithms on twenty-one datasets from the UCI repository. Furthermore, the approach is applied to the coronavirus disease (COVID-19) dataset, yielding the best average classification accuracy and the average number of feature selections, respectively, of 93.47% and 2.1. Experimental results confirm the advantages of the proposed algorithm in improving classification accuracy and reducing the number of selected features compared to other wrapper-based algorithms.