论文标题
在联合频率估计中的安全性和隐私的沟通成本
The communication cost of security and privacy in federated frequency estimation
论文作者
论文摘要
我们考虑了联合频率估计问题,其中每个用户都将私有项目$ x_i $从尺寸-$ d $域中持有,并且服务器旨在估算带有$ n \ ll d $的$ n $项目的经验频率(即直方图)。没有任何安全性和隐私考虑,每个用户可以使用$ \ log d $ bits将其项目传达给服务器。但是,安全的汇总协议的幼稚应用将需要每个用户的$ d \ log n $位。我们可以减少安全汇总所需的沟通,并且安全性是否具有沟通的基本成本? 在本文中,我们为安全汇总开发了一个信息理论模型,使我们能够以通信的安全和隐私的基本成本来表征。我们表明,使用安全性(并且没有隐私)$ω\ left(n \ log d \ right)$ bits $ bits $ bits是必需的,足以允许服务器计算频率分布。这明显小于NAIVE方案所需的每个用户的$ d \ log n $位,但显着高于无安全性所需的每个用户所需的$ \ log d $位。为了实现差异隐私,我们基于嘈杂的草图构建线性方案,该方案在本地散布数据并且不需要受信任的服务器(又称分布式差异隐私)。我们在$ \ ell_2 $和$ \ ell_ \ infty $损失下分析此方案。通过使用我们的信息理论框架,我们表明该方案通过最佳的通信成本实现了最佳的准确性私人关系权衡,同时在中央服务器中存储数据的集中式案例中的性能。
We consider the federated frequency estimation problem, where each user holds a private item $X_i$ from a size-$d$ domain and a server aims to estimate the empirical frequency (i.e., histogram) of $n$ items with $n \ll d$. Without any security and privacy considerations, each user can communicate its item to the server by using $\log d$ bits. A naive application of secure aggregation protocols would, however, require $d\log n$ bits per user. Can we reduce the communication needed for secure aggregation, and does security come with a fundamental cost in communication? In this paper, we develop an information-theoretic model for secure aggregation that allows us to characterize the fundamental cost of security and privacy in terms of communication. We show that with security (and without privacy) $Ω\left( n \log d \right)$ bits per user are necessary and sufficient to allow the server to compute the frequency distribution. This is significantly smaller than the $d\log n$ bits per user needed by the naive scheme, but significantly higher than the $\log d$ bits per user needed without security. To achieve differential privacy, we construct a linear scheme based on a noisy sketch which locally perturbs the data and does not require a trusted server (a.k.a. distributed differential privacy). We analyze this scheme under $\ell_2$ and $\ell_\infty$ loss. By using our information-theoretic framework, we show that the scheme achieves the optimal accuracy-privacy trade-off with optimal communication cost, while matching the performance in the centralized case where data is stored in the central server.