论文标题
学习离散分布:用户与项目级隐私
Learning discrete distributions: user vs item-level privacy
论文作者
论文摘要
关于差异隐私的许多文献都集中在项目级别的隐私上,在这种情况下,目的是为每个项目或培训示例提供隐私。但是,最近许多实际应用(例如联合学习)需要为单个用户的所有项目保留隐私,这很难实现。因此,了解用户级隐私的理论限制变得至关重要。 我们研究了具有用户级差异隐私的$ k $符号上的离散分布的基本问题。如果每个用户都有$ m $样本,我们表明拉普拉斯或高斯机制的直接应用要求用户数量为$ \ nathcal {o}(k/(mα^2) + k/εα$,以实现$ \ ell_1 $ $ a $ $ a $ nrupation $ knture $ k $ k.1 $ a的$ k $ k. $ m $。此外,我们表明,任何仅在最终汇总计数上运行的机制都应需要相同顺序的用户复杂性。 We then propose a mechanism such that the number of users scales as $\tilde{\mathcal{O}}(k/(mα^2) + k/\sqrt{m}εα)$ and hence the privacy penalty is $\tildeΘ(\sqrt{m})$ times smaller compared to the standard mechanisms in certain settings of interest.我们进一步表明,在某些政权下,所提出的机制几乎是最佳的。 我们还提出了一般技术,以在受限的差异私有估计器上获得下限,并且对二项式分布之间的总差异进行了下限,这两者都可能具有独立的利益。
Much of the literature on differential privacy focuses on item-level privacy, where loosely speaking, the goal is to provide privacy per item or training example. However, recently many practical applications such as federated learning require preserving privacy for all items of a single user, which is much harder to achieve. Therefore understanding the theoretical limit of user-level privacy becomes crucial. We study the fundamental problem of learning discrete distributions over $k$ symbols with user-level differential privacy. If each user has $m$ samples, we show that straightforward applications of Laplace or Gaussian mechanisms require the number of users to be $\mathcal{O}(k/(mα^2) + k/εα)$ to achieve an $\ell_1$ distance of $α$ between the true and estimated distributions, with the privacy-induced penalty $k/εα$ independent of the number of samples per user $m$. Moreover, we show that any mechanism that only operates on the final aggregate counts should require a user complexity of the same order. We then propose a mechanism such that the number of users scales as $\tilde{\mathcal{O}}(k/(mα^2) + k/\sqrt{m}εα)$ and hence the privacy penalty is $\tildeΘ(\sqrt{m})$ times smaller compared to the standard mechanisms in certain settings of interest. We further show that the proposed mechanism is nearly-optimal under certain regimes. We also propose general techniques for obtaining lower bounds on restricted differentially private estimators and a lower bound on the total variation between binomial distributions, both of which might be of independent interest.