论文标题

为具有RNN的现实人群生成合成的流动性数据,以改善效用和隐私

Generating synthetic mobility data for a realistic population with RNNs to improve utility and privacy

论文作者

Berke, Alex, Doorley, Ronan, Larson, Kent, Moro, Esteban

论文摘要

从移动设备收集的位置数据代表个人和社会层面的移动性行为。这些数据具有从运输规划到流行建模的重要应用。但是,必须克服问题才能最好地服务于这些用例:数据通常代表了人口的有限样本,并且数据的使用危害了隐私。 为了解决这些问题,我们介绍并评估了一个系统,该系统使用深层复发性神经网络(RNN)生成综合移动性数据,该系统对实际位置数据进行了培训。该系统将人群分布作为输入,并为相应的合成人群生成迁移率痕迹。 相关的生成方法尚未解决捕获较长时间段的个体流动性行为的模式和可变性的挑战,同时还可以平衡现实数据的产生与隐私。我们的系统利用RNNS生成复杂和新颖的序列的能力,同时从训练数据中保留模式。此外,该模型还引入了用于校准单个级别的合成数据和真实数据之间的变化的随机性。这既是捕获人类流动性的可变性,又是保护用户隐私。 跨公用事业和隐私指标的实验评估,使用了来自22,700多个移动设备的位置服务(LB)数据。我们显示生成的移动性数据保留了真实数据的特征,而在单个级别的真实数据以及此数量的变化与真实数据中的差异相匹配的情况下。

Location data collected from mobile devices represent mobility behaviors at individual and societal levels. These data have important applications ranging from transportation planning to epidemic modeling. However, issues must be overcome to best serve these use cases: The data often represent a limited sample of the population and use of the data jeopardizes privacy. To address these issues, we present and evaluate a system for generating synthetic mobility data using a deep recurrent neural network (RNN) which is trained on real location data. The system takes a population distribution as input and generates mobility traces for a corresponding synthetic population. Related generative approaches have not solved the challenges of capturing both the patterns and variability in individuals' mobility behaviors over longer time periods, while also balancing the generation of realistic data with privacy. Our system leverages RNNs' ability to generate complex and novel sequences while retaining patterns from training data. Also, the model introduces randomness used to calibrate the variation between the synthetic and real data at the individual level. This is to both capture variability in human mobility, and protect user privacy. Location based services (LBS) data from more than 22,700 mobile devices were used in an experimental evaluation across utility and privacy metrics. We show the generated mobility data retain the characteristics of the real data, while varying from the real data at the individual level, and where this amount of variation matches the variation within the real data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源