论文标题
超越头版:测量现场的第三方动态
Beyond the Front Page: Measuring Third Party Dynamics in the Field
论文作者
论文摘要
在现代网络中,服务提供商通常严重依赖第三方来运行他们的服务。例如,他们利用广告网络为他们的服务提供资金,外部托管的库来快速开发功能,并分析提供商以了解访客行为。 为了安全性和隐私,网站所有者需要了解他们为用户提供的内容。但是,实际上,他们通常不知道哪些第三方是嵌入的,例如,当这些第三方请求其他内容时,因为它在实时AD Auctions中很常见。 在本文中,我们提出了一项大规模的测量研究,以分析这些新挑战的大小。为了更好地反映第三方的联系,我们在称为第三方树的模型中测量了它们的关系,这反映了嵌入给定网站的所有第三方的负载依赖关系的近似值。使用此概念,我们表明,包括单个第三方可以导致最多八个其他服务的随后请求。此外,我们的发现表明,嵌入在页面负载上的第三方并不总是确定性的,因为第三方树中50%的分支在重复访问之间变化。此外,我们发现,有93%的分析网站嵌入了位于可能与当前法律框架不符的地区中的第三方。我们的研究还复制了以前的工作,主要集中在网站的登陆页面上。我们表明,该方法仅能够测量下限,因为亚地石显示了隐私侵入性技术的显着增加。例如,我们的结果表明,在更深入地爬行网站时,使用的cookie增加了约36%。
In the modern Web, service providers often rely heavily on third parties to run their services. For example, they make use of ad networks to finance their services, externally hosted libraries to develop features quickly, and analytics providers to gain insights into visitor behavior. For security and privacy, website owners need to be aware of the content they provide their users. However, in reality, they often do not know which third parties are embedded, for example, when these third parties request additional content as it is common in real-time ad auctions. In this paper, we present a large-scale measurement study to analyze the magnitude of these new challenges. To better reflect the connectedness of third parties, we measured their relations in a model we call third party trees, which reflects an approximation of the loading dependencies of all third parties embedded into a given website. Using this concept, we show that including a single third party can lead to subsequent requests from up to eight additional services. Furthermore, our findings indicate that the third parties embedded on a page load are not always deterministic, as 50% of the branches in the third party trees change between repeated visits. In addition, we found that 93% of the analyzed websites embedded third parties that are located in regions that might not be in line with the current legal framework. Our study also replicates previous work that mostly focused on landing pages of websites. We show that this method is only able to measure a lower bound as subsites show a significant increase of privacy-invasive techniques. For example, our results show an increase of used cookies by about 36% when crawling websites more deeply.