基于机器学习的网络钓鱼URL检测器的可靠性和鲁棒性分析

论文标题

基于机器学习的网络钓鱼URL检测器的可靠性和鲁棒性分析

Reliability and Robustness analysis of Machine Learning based Phishing URL Detectors

论文作者

Sabir, Bushra, Babar, M. Ali, Gaire, Raj, Abuadbba, Alsharif

论文摘要

基于ML的网络钓鱼URL（MLPU）探测器是保护用户和组织免受网络钓鱼攻击的受害者的第一级防御。最近，很少有研究对特定的MLPU探测器进行了成功的对抗性攻击，从而提出了有关其实际可靠性和使用的问题。然而，这些系统的鲁棒性尚未得到广泛的研究。因此，通常，这些系统的安全漏洞仍然主要未知，该漏洞要求测试这些系统的鲁棒性。在本文中，我们提出了一种研究方法，以研究50种代表性最先进的MLPU模型的可靠性和鲁棒性。首先，我们提出了一个具有成本效益的对抗URL发电机URLBUG，该URLBUG创建了一个对抗性URL数据集。随后，我们重现了50个MLPU（传统的ML和深度学习）系统，并记录了其基线性能。最后，我们在对抗数据集上测试了考虑的MLPU系统，并使用框图和热图分析了它们的鲁棒性和可靠性。我们的结果表明，生成的对抗URL具有有效的语法，可以以年薪中位数为\ $ 11.99进行注册。在已经注册的对抗URL中，有13％\％用于恶意目的。此外，当针对$ adv_ \ mathrm {data} $测试时，考虑的MLPU模型MATTHEW相关系数（MCC）从中位数0.92降至0.02，表明基线MLPU模型在当前形式中是不可靠的。此外，我们的发现确定了这些系统的几个安全漏洞，并为研究人员提供了设计可靠和安全的MLPU系统的未来方向。

ML-based Phishing URL (MLPU) detectors serve as the first level of defence to protect users and organisations from being victims of phishing attacks. Lately, few studies have launched successful adversarial attacks against specific MLPU detectors raising questions about their practical reliability and usage. Nevertheless, the robustness of these systems has not been extensively investigated. Therefore, the security vulnerabilities of these systems, in general, remain primarily unknown which calls for testing the robustness of these systems. In this article, we have proposed a methodology to investigate the reliability and robustness of 50 representative state-of-the-art MLPU models. Firstly, we have proposed a cost-effective Adversarial URL generator URLBUG that created an Adversarial URL dataset. Subsequently, we reproduced 50 MLPU (traditional ML and Deep learning) systems and recorded their baseline performance. Lastly, we tested the considered MLPU systems on Adversarial Dataset and analyzed their robustness and reliability using box plots and heat maps. Our results showed that the generated adversarial URLs have valid syntax and can be registered at a median annual price of \$11.99. Out of 13\% of the already registered adversarial URLs, 63.94\% were used for malicious purposes. Moreover, the considered MLPU models Matthew Correlation Coefficient (MCC) dropped from a median 0.92 to 0.02 when tested against $Adv_\mathrm{data}$, indicating that the baseline MLPU models are unreliable in their current form. Further, our findings identified several security vulnerabilities of these systems and provided future directions for researchers to design dependable and secure MLPU systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题