探索基于图像的恶意软件变体分类的最佳深度学习模型

论文标题

探索基于图像的恶意软件变体分类的最佳深度学习模型

Exploring Optimal Deep Learning Models for Image-based Malware Variant Classification

论文作者

Mitsuhashi, Rikima, Shinagawa, Takahiro

论文摘要

分析大量恶意软件是安全分析师的重大负担。由于新兴恶意软件通常是现有恶意软件的一种变体，因此将恶意软件分类为已知家庭会大大减轻其负担的一部分。具有深度学习的基于图像的恶意软件分类是其简单性，多功能性和与最新技术相关的有吸引力的方法。但是，深度学习模型中差异的影响以及转移学习程度对恶意软件变体的分类准确性的影响尚未得到充分研究。在本文中，我们使用24个ImageNet预训练的模型和5个微调参数对深度学习模型进行了详尽的调查，在两个平台上总共有120个组合。结果，我们发现，通过微调传输程度相对较低的最新深度学习模型之一，获得了最高的分类精度，并且我们在Malimg和Drebin数据集上实现了有史以来最高的分类精度。我们还确认，使用Virustotal 2020 Windows和Android数据集的最近的恶意软件变体是正确的。实验结果表明，通过逐渐降低一半的转移学习程度，通过最新模型和恶意软件数据集定期探索最佳深度学习模型是有效的。

Analyzing a huge amount of malware is a major burden for security analysts. Since emerging malware is often a variant of existing malware, automatically classifying malware into known families greatly reduces a part of their burden. Image-based malware classification with deep learning is an attractive approach for its simplicity, versatility, and affinity with the latest technologies. However, the impact of differences in deep learning models and the degree of transfer learning on the classification accuracy of malware variants has not been fully studied. In this paper, we conducted an exhaustive survey of deep learning models using 24 ImageNet pre-trained models and five fine-tuning parameters, totaling 120 combinations, on two platforms. As a result, we found that the highest classification accuracy was obtained by fine-tuning one of the latest deep learning models with a relatively low degree of transfer learning, and we achieved the highest classification accuracy ever in cross-validation on the Malimg and Drebin datasets. We also confirmed that this trend holds true for the recent malware variants using the VirusTotal 2020 Windows and Android datasets. The experimental results suggest that it is effective to periodically explore optimal deep learning models with the latest models and malware datasets by gradually reducing the degree of transfer learning from half.

下载PDF全文

下载文献需遵守相关版权规定

论文标题