论文标题
网络上基于兴趣的广告的隐私局限性:对Google Floc的验尸经验分析
Privacy Limitations Of Interest-based Advertising On The Web: A Post-mortem Empirical Analysis Of Google's FLoC
论文作者
论文摘要
2020年,Google宣布将禁用Chrome浏览器中的第三方Cookie,以改善用户隐私。为了继续启用基于兴趣的广告,同时减轻了个性化用户跟踪的风险,Google建议使用。 FLOC算法将用户分配给代表具有相似浏览行为的用户组的“同伙”,以便可以根据其同类群体向用户提供广告。在2022年,在一次现实世界试验中测试了FLOC之后,Google几乎没有解释取消了该提案。在这项工作中,我们通过将floc的实施应用于一年内从90,000多个美国设备收集的浏览数据集中,对FLOC的两种关键隐私风险进行了验尸分析。首先,我们展示了如何通过为跨站点可用的用户提供唯一的标识符来启用跨站点用户跟踪,类似于第三方cookie floc,这是一种改进。我们展示了随着时间的推移观察到的Floc队列ID序列如何为跟踪器提供此标识符,即使是第三方Cookies禁用。我们估计数据集中的用户数量可能会被The floc ID唯一识别的用户数量超过50%,而4周后4周后的用户数量超过95%。我们还展示了当队列数据与浏览器指纹组合结合时,这些风险是如何增加的,以及我们的结果如何低估了Floc在现实世界中会带来的真正风险。其次,我们检查了泡沫泄漏敏感的人口统计信息的风险。尽管我们发现人口统计组之间浏览行为的统计学差异在统计学上显着差异,但我们并没有发现在我们数据集中,Floc显着冒着暴露有关用户的种族或收入信息的风险。我们的贡献为未来的方法提供了洞察力和示例分析,这些方法试图在网络货币化的同时保护用户隐私。
In 2020, Google announced it would disable third-party cookies in the Chrome browser to improve user privacy. In order to continue to enable interest-based advertising while mitigating risks of individualized user tracking, Google proposed FLoC. The FLoC algorithm assigns users to "cohorts" that represent groups of users with similar browsing behaviors so that ads can be served to users based on their cohort. In 2022, after testing FLoC in a real world trial, Google canceled the proposal with little explanation. In this work, we provide a post-mortem analysis of two critical privacy risks for FloC by applying an implementation of FLoC to a browsing dataset collected from over 90,000 U.S. devices over a one year period. First, we show how, contrary to its privacy goals, FLoC would have enabled cross-site user tracking by providing a unique identifier for users available across sites, similar to the third-party cookies FLoC was meant to be an improvement over. We show how FLoC cohort ID sequences observed over time can provide this identifier to trackers, even with third-party cookies disabled. We estimate the number of users in our dataset that could be uniquely identified by FLoC IDs is more than 50% after 3 weeks and more than 95% after 4 weeks. We also show how these risks increase when cohort data are combined with browser fingerprinting, and how our results underestimate the true risks FLoC would have posed in a real-world deployment. Second, we examine the risk of FLoC leaking sensitive demographic information. Although we find statistically significant differences in browsing behaviors between demographic groups, we do not find that FLoC significantly risks exposing race or income information about users in our dataset. Our contributions provide insights and example analyses for future approaches that seek to protect user privacy while monetizing the web.