论文标题

符号系统的差异隐私,并应用马尔可夫连锁店

Differential Privacy for Symbolic Systems with Application to Markov Chains

论文作者

Chen, Bo, Leahy, Kevin, Jones, Austin, Hale, Matthew

论文摘要

数据驱动的系统正在收集从用户的越来越多的数据,敏感用户数据需要隐私保护。在某些情况下,收集的数据是非数字或象征性的,并且传统的隐私方法,例如添加噪声,不适用,尽管此类系统仍然需要隐私保护。因此,我们提出了一个新的差异隐私框架,用于保护符号系统产生的轨迹。这些轨迹可以表示为有限字母上的单词或字符串。我们开发了新的差异隐私机制,该机制使用可能靠近它的随机词近似敏感的单词。使用修改后的锤距自动机有效地实现了离线机制,以在有限的时间范围内生成整个私有化的输出单词。然后,通过在每个时间段上进行敏感符号并生成一个随机输出符号来实现在线机制。这项工作扩展到马尔可夫连锁店,以生成给定的马尔可夫链本可以产生的差异私有状态序列。开发了统计精度界限来量化这些机制的准确性,并且数值结果验证了这些技术对于英语单词字符串的准确性。

Data-driven systems are gathering increasing amounts of data from users, and sensitive user data requires privacy protections. In some cases, the data gathered is non-numerical or symbolic, and conventional approaches to privacy, e.g., adding noise, do not apply, though such systems still require privacy protections. Accordingly, we present a novel differential privacy framework for protecting trajectories generated by symbolic systems. These trajectories can be represented as words or strings over a finite alphabet. We develop new differential privacy mechanisms that approximate a sensitive word using a random word that is likely to be near it. An offline mechanism is implemented efficiently using a Modified Hamming Distance Automaton to generate whole privatized output words over a finite time horizon. Then, an online mechanism is implemented by taking in a sensitive symbol and generating a randomized output symbol at each timestep. This work is extended to Markov chains to generate differentially private state sequences that a given Markov chain could have produced. Statistical accuracy bounds are developed to quantify the accuracy of these mechanisms, and numerical results validate the accuracy of these techniques for strings of English words.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源