论文标题
GraphTrack:基于图形的跨设备跟踪框架
GraphTrack: A Graph-based Cross-Device Tracking Framework
论文作者
论文摘要
跨设备跟踪引起了商业公司和公众的越来越多的关注,因为其隐私含义和用户概况,个性化服务等。一种特殊的,广泛的类型的跨设备跟踪类型是利用用户设备的浏览历史记录,例如,该设备访问的设备和设备访问的IP地址列表的特征是该设备访问的IP地址列表。但是,现有的基于历史记录的方法具有三个缺点。首先,它们无法捕获IP和域之间的潜在相关性。其次,当不可用的设备对时,它们的性能会大大降低。最后,在将浏览历史记录与设备联系起来时,它们对不确定性并不确定。我们提出了GraphTrack,这是一种基于图形的跨设备跟踪框架,以通过关联其浏览历史记录来跟踪用户跨不同设备的用户。具体而言,我们建议将IPS,域和设备之间的复杂互动建模为图形,并捕获IPS之间和域之间的潜在相关性。我们构建了将浏览历史记录与设备联系起来的不确定性的强大图表。此外,我们将随机步行与重新启动调整,以根据图表计算设备之间的相似性得分。 GraphTrack利用相似性得分来执行跨设备跟踪。 GraphTrack不需要标记的设备对,可以在可用的情况下合并。我们在两个现实世界数据集上评估GraphTrack,即我们收集的一个公开可用的移动设备跟踪数据集(大约100个用户)和我们收集的一个多设备跟踪数据集(154K用户)。我们的结果表明,GraphTrack在两个数据集上大大胜过最新的。
Cross-device tracking has drawn growing attention from both commercial companies and the general public because of its privacy implications and applications for user profiling, personalized services, etc. One particular, wide-used type of cross-device tracking is to leverage browsing histories of user devices, e.g., characterized by a list of IP addresses used by the devices and domains visited by the devices. However, existing browsing history based methods have three drawbacks. First, they cannot capture latent correlations among IPs and domains. Second, their performance degrades significantly when labeled device pairs are unavailable. Lastly, they are not robust to uncertainties in linking browsing histories to devices. We propose GraphTrack, a graph-based cross-device tracking framework, to track users across different devices by correlating their browsing histories. Specifically, we propose to model the complex interplays among IPs, domains, and devices as graphs and capture the latent correlations between IPs and between domains. We construct graphs that are robust to uncertainties in linking browsing histories to devices. Moreover, we adapt random walk with restart to compute similarity scores between devices based on the graphs. GraphTrack leverages the similarity scores to perform cross-device tracking. GraphTrack does not require labeled device pairs and can incorporate them if available. We evaluate GraphTrack on two real-world datasets, i.e., a publicly available mobile-desktop tracking dataset (around 100 users) and a multiple-device tracking dataset (154K users) we collected. Our results show that GraphTrack substantially outperforms the state-of-the-art on both datasets.