论文标题
数字痕迹的顶级查询
Top-k queries over digital traces
论文作者
论文摘要
社会和移动技术的最新进展使大量数字痕迹(以移动设备的形式,移动设备与特定WiFi热点的协会的形式)揭示了各种实体集合(例如,人类,设备和车辆)的物理存在历史。一个具有挑战性但重要的任务是根据其数字痕迹确定与给定查询实体最紧密相关的K实体。我们提出了一套索引技术和算法,以在大规模上为此问题提供快速查询处理。我们首先定义了一个通用的函数家族,以测量实体之间的关联,然后提出算法将数字痕迹转换为较低维空间以进行更有效的计算。随后,我们设计了一个层次索引结构,以紧密相关的实体倾向于一起组织实体。然后,我们开发使用索引来处理TOP-K查询的算法。我们理论上根据我们在现实生活中提出和验证的迁移率模型分析了提出方法的修剪效率。最后,我们按大规模对合成和真实数据集进行了广泛的实验,在分析和实验上评估了我们的技术的性能,从而确认了我们方法的有效性和优越性,而不是各种参数设置和数据集中的其他适用方法。
Recent advances in social and mobile technology have enabled an abundance of digital traces (in the form of mobile check-ins, association of mobile devices to specific WiFi hotspots, etc.) revealing the physical presence history of diverse sets of entities (e.g., humans, devices, and vehicles). One challenging yet important task is to identify k entities that are most closely associated with a given query entity based on their digital traces. We propose a suite of indexing techniques and algorithms to enable fast query processing for this problem at scale. We first define a generic family of functions measuring the association between entities, and then propose algorithms to transform digital traces into a lower-dimensional space for more efficient computation. We subsequently design a hierarchical indexing structure to organize entities in a way that closely associated entities tend to appear together. We then develop algorithms to process top-k queries utilizing the index. We theoretically analyze the pruning effectiveness of the proposed methods based on a mobility model which we propose and validate in real life situations. Finally, we conduct extensive experiments on both synthetic and real datasets at scale, evaluating the performance of our techniques both analytically and experimentally, confirming the effectiveness and superiority of our approach over other applicable approaches across a variety of parameter settings and datasets.