Snoopy：带有有限查询模型的网页指纹框架

论文标题

Snoopy：带有有限查询模型的网页指纹框架

Snoopy: A Webpage Fingerprinting Framework with Finite Query Model for Mass-Surveillance

论文作者

Mitra, Gargi, Vairam, Prasanna Karthik, Saha, Sandip, Chandrachoodan, Nitin, Kamakoti, V.

论文摘要

尽管使用了加密，但互联网用户很容易受到隐私攻击。网页指纹印刷是一种分析加密流量的攻击，可以确定用户在给定网站中访问的网页。最近的研究工作已经成功地展示了对个人用户的网页指纹攻击，但在扩展其质量监视的攻击方面没有成功。执行大规模尺度网页指纹识别的主要挑战是（i）用户行为和要考虑的偏好的纯粹组合数量以及；（ii）在网站上部署的国防机制（例如DDOS辩护）提出的网站查询数量的限制。这些限制排除了传统的基于数据密集型ML的技术的使用。在这项工作中，我们提出了Snoopy，这是一个首先的框架，它为访问网站的大量用户执行网页指纹打印。 Snoopy符合质量保存的概括要求，同时遵守网站访问的数量（有限查询模型），以进行交通样本。为此，Snoopy使用一个不受不同浏览上下文（OS，浏览器，caching，cookie设置）影响的功能（即加密资源大小的序列）。 Snoopy使用静态分析技术来预测由浏览环境中的多样性产生的因素，例如标题，MTU和用户代理字符串等因素引起的变化。我们表明，在大多数网站上，在各种浏览环境中评估时，Snoopy的精度约为90％。在遵守有限查询模型的同时，一个简单的Snoopy和基于ML的技术的精度约为97％，如果单独使用史努比的情况表现不佳。

Internet users are vulnerable to privacy attacks despite the use of encryption. Webpage fingerprinting, an attack that analyzes encrypted traffic, can identify the webpages visited by a user in a given website. Recent research works have been successful in demonstrating webpage fingerprinting attacks on individual users, but have been unsuccessful in extending their attack for mass-surveillance. The key challenges in performing mass-scale webpage fingerprinting arises from (i) the sheer number of combinations of user behavior and preferences to account for, and; (ii) the bound on the number of website queries imposed by the defense mechanisms (e.g., DDoS defense) deployed at the website. These constraints preclude the use of conventional data-intensive ML-based techniques. In this work, we propose Snoopy, a first-of-its-kind framework, that performs webpage fingerprinting for a large number of users visiting a website. Snoopy caters to the generalization requirements of mass-surveillance while complying with a bound on the number of website accesses (finite query model) for traffic sample collection. For this, Snoopy uses a feature (i.e., sequence of encrypted resource sizes) that is either unaffected or predictably affected by different browsing contexts (OS, browser, caching, cookie settings). Snoopy uses static analysis techniques to predict the variations caused by factors such as header sizes, MTU, and User Agent String that arise from the diversity in browsing contexts. We show that Snoopy achieves approximately 90% accuracy when evaluated on most websites, across various browsing contexts. A simple ensemble of Snoopy and an ML-based technique achieves approximately 97% accuracy while adhering to the finite query model, in cases when Snoopy alone does not perform well.

下载PDF全文

下载文献需遵守相关版权规定

论文标题