论文标题

用于网络流量中新颖性检测的功能提取

Feature Extraction for Novelty Detection in Network Traffic

论文作者

Yang, Kun, Kpotufe, Samory, Feamster, Nick

论文摘要

数据表示在机器学习中的新颖性检测(或``异常检测'')方法中起着至关重要的作用。网络流量的数据表示通常与模型本身一样多地决定了这些模型的有效性。网络运营商需要检测(例如攻击,恶意软件,新应用程序,交通需求变化)的各种新颖事件引入了广泛的可能模型和数据表示的可能性。在每种情况下,从业人员必须花费大量精力提取和工程功能,这些功能对这种情况或应用是最可预测的。尽管在计算机网络中进行了充分检测,但许多现有的工作开发了假定特定表示形式的特定模型 - 通常是IPFIX/NETFLOW。但是,其他表示形式可能会导致更高的模型准确性,而可编程网络的兴起现在使探索更广泛的表示形式更加实用。为了促进此类探索,我们开发了系统的框架,开源工具包和公共Python库,使其既有可能,易于从网络流量中提取和生成功能,并在最普遍的现代新颖性检测模型中对这些表示形式进行端到端评估。我们首先开发并公开发布一个开源工具,随附的Python库(NETML)以及网络流量中新颖性检测的端到端管道。其次,我们将此工具应用于网络中五个不同的新颖性检测问题,从攻击检测到新型设备检测的一系列情况。我们的发现有关哪些功能似乎更适合特定情况的一般见解和指南。

Data representation plays a critical role in the performance of novelty detection (or ``anomaly detection'') methods in machine learning. The data representation of network traffic often determines the effectiveness of these models as much as the model itself. The wide range of novel events that network operators need to detect (e.g., attacks, malware, new applications, changes in traffic demands) introduces the possibility for a broad range of possible models and data representations. In each scenario, practitioners must spend significant effort extracting and engineering features that are most predictive for that situation or application. While anomaly detection is well-studied in computer networking, much existing work develops specific models that presume a particular representation -- often IPFIX/NetFlow. Yet, other representations may result in higher model accuracy, and the rise of programmable networks now makes it more practical to explore a broader range of representations. To facilitate such exploration, we develop a systematic framework, open-source toolkit, and public Python library that makes it both possible and easy to extract and generate features from network traffic and perform and end-to-end evaluation of these representations across most prevalent modern novelty detection models. We first develop and publicly release an open-source tool, an accompanying Python library (NetML), and end-to-end pipeline for novelty detection in network traffic. Second, we apply this tool to five different novelty detection problems in networking, across a range of scenarios from attack detection to novel device detection. Our findings general insights and guidelines concerning which features appear to be more appropriate for particular situations.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源