论文标题
Federated Ton_iot Windows数据集用于评估基于AI的安全应用程序
Federated TON_IoT Windows Datasets for Evaluating AI-based Security Applications
论文作者
论文摘要
现有的网络安全解决方案基本上是使用基于知识的模型开发的,这些模型通常无法触发新的网络攻击家庭。随着人工智能(AI)的繁荣,尤其是深度学习(DL)算法,这些安全解决方案已插入AI模型,以发现,追踪,减轻或响应新安全事件的事件。该算法需要大量的异质数据源来训练和验证新的安全系统。本文介绍了新数据集的描述,即所谓的ton_iot,其中涉及从物联网服务的遥测数据集,Windows和Linux的操作系统数据集以及网络流量的数据集中收集的联合数据源。本文介绍了Windows操作系统的TON_IOT数据集的测试床和描述。测试床分为三层:边缘,雾和云。边缘层涉及物联网和网络设备,雾层包含虚拟机和网关,云层涉及云服务,例如数据分析,链接到其他两个层。使用软件定义网络(SDN)和网络函数虚拟化(NFV)的平台对这些层进行动态管理,并使用VMware NSX和VCLOUD NFV平台进行管理。 Windows数据集是从记忆,处理器,网络,进程和硬盘的审核轨迹中收集的。该数据集将用于评估各种基于AI的网络安全解决方案,包括入侵检测,威胁智能和狩猎,隐私保护和数字取证。这是因为数据集具有广泛的近期正常和攻击特征和观察结果,以及真实的地面真相事件。可以从此链接公开访问数据集[1]。
Existing cyber security solutions have been basically developed using knowledge-based models that often cannot trigger new cyber-attack families. With the boom of Artificial Intelligence (AI), especially Deep Learning (DL) algorithms, those security solutions have been plugged-in with AI models to discover, trace, mitigate or respond to incidents of new security events. The algorithms demand a large number of heterogeneous data sources to train and validate new security systems. This paper presents the description of new datasets, the so-called ToN_IoT, which involve federated data sources collected from telemetry datasets of IoT services, operating system datasets of Windows and Linux, and datasets of network traffic. The paper introduces the testbed and description of TON_IoT datasets for Windows operating systems. The testbed was implemented in three layers: edge, fog and cloud. The edge layer involves IoT and network devices, the fog layer contains virtual machines and gateways, and the cloud layer involves cloud services, such as data analytics, linked to the other two layers. These layers were dynamically managed using the platforms of software-Defined Network (SDN) and Network-Function Virtualization (NFV) using the VMware NSX and vCloud NFV platform. The Windows datasets were collected from audit traces of memories, processors, networks, processes and hard disks. The datasets would be used to evaluate various AI-based cyber security solutions, including intrusion detection, threat intelligence and hunting, privacy preservation and digital forensics. This is because the datasets have a wide range of recent normal and attack features and observations, as well as authentic ground truth events. The datasets can be publicly accessed from this link [1].