MAAT：自动分析Virustotal，以进行准确的标签和有效的恶意软件检测

论文标题

MAAT：自动分析Virustotal，以进行准确的标签和有效的恶意软件检测

Maat: Automatically Analyzing VirusTotal for Accurate Labeling and Effective Malware Detection

论文作者

Salem, Aleieldin, Banescu, Sebastian, Pretschner, Alexander

论文摘要

恶意软件分析和检测研究社区依靠在线平台Virustotal根据大约60台抗病毒药扫描仪的扫描结果标记Android应用程序。不幸的是，没有关于如何最好地解释从Virustotal获得的扫描结果的标准，这导致利用不同的基于阈值的标签策略（例如，如果十个或更多的扫描仪认为应用程序恶意，则被认为是恶意的）。尽管某些使用的阈值可能能够准确地近似应用程序的地面真相，但病毒性使用的扫描仪的集合和版本会导致此类阈值随着时间的流逝而变化。我们实施了一种方法，即通过自动生成机器学习（ML）的标签方案来解决这些标准化和可持续性问题，该计划的表现优于基于阈值的标签策略。使用一年范围的53K Android应用程序的Virustotal扫描报告，我们通过将其基于ML的标签策略的适用性与基于阈值的策略进行了比较，从而评估了MAAT基于ML的标签策略。我们发现，这种基于ML的策略（A）可以根据其Virustotal扫描报告准确，一致地标记应用程序，并且（b）有助于培训基于ML的检测方法，这些方法比基于阈值的对应物更有效地分类样品外应用程序。

The malware analysis and detection research community relies on the online platform VirusTotal to label Android apps based on the scan results of around 60 antiviral scanners. Unfortunately, there are no standards on how to best interpret the scan results acquired from VirusTotal, which leads to the utilization of different threshold-based labeling strategies (e.g., if ten or more scanners deem an app malicious, it is considered malicious). While some of the utilized thresholds may be able to accurately approximate the ground truths of apps, the fact that VirusTotal changes the set and versions of the scanners it uses makes such thresholds unsustainable over time. We implemented a method, Maat, that tackles these issues of standardization and sustainability by automatically generating a Machine Learning (ML)-based labeling scheme, which outperforms threshold-based labeling strategies. Using the VirusTotal scan reports of 53K Android apps that span one year, we evaluated the applicability of Maat's ML-based labeling strategies by comparing their performance against threshold-based strategies. We found that such ML-based strategies (a) can accurately and consistently label apps based on their VirusTotal scan reports, and (b) contribute to training ML-based detection methods that are more effective at classifying out-of-sample apps than their threshold-based counterparts.

下载PDF全文

下载文献需遵守相关版权规定

论文标题