论文标题
朝着可重现的网络流量分析
Towards Reproducible Network Traffic Analysis
论文作者
论文摘要
鉴于加密流量的较高比例和提高数据速率,分析技术对于洞悉网络流量至关重要。不幸的是,网络流量分析的领域缺乏标准化,导致了无与伦比的结果和可重复性的障碍。与其他学科不同,不存在标准数据集格式,迫使研究人员和从业人员为每个任务创建定制分析管道。如果没有标准化,研究人员将无法比较“苹果到苹果”,从而阻止我们确定知道新技术是否代表了方法论进步,或者只是从对给定数据集的不同解释中受益。 在这项工作中,我们研究了由于网络流量分析中缺乏标准化而产生的不可夸大性。首先,我们研究文献,强调了基于流行公共数据集的不同解释的不可重复研究的证据。接下来,我们研究导致现状并防止可重现研究的潜在问题。第三,我们概述了旨在解决可重复性问题的任何解决方案必须解决的标准化要求。然后,我们介绍PCAPML,这是一种开源系统,通过使元数据信息可以直接以通用方式将元数据信息直接编码为原始的流量捕获,从而提高了网络流量分析研究的可重复性。最后,我们使用标准化PCAPML提供了创建PCAPML基准,开源排行榜网站和构建的存储库来跟踪网络流量分析方法的进度。
Analysis techniques are critical for gaining insight into network traffic given both the higher proportion of encrypted traffic and increasing data rates. Unfortunately, the domain of network traffic analysis suffers from a lack of standardization, leading to incomparable results and barriers to reproducibility. Unlike other disciplines, no standard dataset format exists, forcing researchers and practitioners to create bespoke analysis pipelines for each individual task. Without standardization researchers cannot compare "apples-to-apples", preventing us from knowing with certainty if a new technique represents a methodological advancement or if it simply benefits from a different interpretation of a given dataset. In this work, we examine irreproducibility that arises from the lack of standardization in network traffic analysis. First, we study the literature, highlighting evidence of irreproducible research based on different interpretations of popular public datasets. Next, we investigate the underlying issues that have lead to the status quo and prevent reproducible research. Third, we outline the standardization requirements that any solution aiming to fix reproducibility issues must address. We then introduce pcapML, an open source system which increases reproducibility of network traffic analysis research by enabling metadata information to be directly encoded into raw traffic captures in a generic manner. Finally, we use the standardization pcapML provides to create the pcapML benchmarks, an open source leaderboard website and repository built to track the progress of network traffic analysis methods.