论文标题
一种基于区块链的方法,用于节省和跟踪差异私人关系成本
A Blockchain-Based Approach for Saving and Tracking Differential-Privacy Cost
论文作者
论文摘要
现在正在收集越来越多的用户敏感信息以进行分析。为了保护用户的隐私,在文献中广泛研究了差异隐私。具体而言,差异化算法会在查询的真实答案中添加噪音,以产生嘈杂的响应。结果,噪声输出泄漏的有关数据集的信息受隐私参数界定。通常,需要使用数据集来回答多个查询(例如,对于多个分析任务),因此,随着更多查询的回答,隐私保护级别可能会降低。因此,至关重要的是跟踪不应超过给定隐私预算的隐私支出。此外,如果以前已经回答了查询,并且在同一数据集上再次询问了查询,我们可能会重复使用当前查询的先前嘈杂响应以节省隐私成本。鉴于上述内容,我们设计并实施了一个基于区块链的系统,以跟踪和节省差异性私人关系成本。区块链提供了一个分布式不可变的分类帐,该分类帐记录了每种查询类型,用于回答每个查询的嘈杂响应,相关的噪声级别添加到真实查询结果中以及我们系统中的其余隐私预算。此外,由于区块链记录了用于回答每个查询的嘈杂响应,因此,如果反复提出相同的查询,我们还设计了一种算法来重复使用以前的嘈杂响应。具体而言,考虑到同一查询的不同请求可能具有不同的隐私要求,我们的算法(通过严格的证明)能够设置旧噪声响应的最佳重复使用分数,并添加新的噪音(如有必要)以最大程度地减少累积的隐私成本。实验结果表明,所提出的算法可以大大降低隐私成本,而不会损害数据准确性。
An increasing amount of users' sensitive information is now being collected for analytics purposes. To protect users' privacy, differential privacy has been widely studied in the literature. Specifically, a differentially private algorithm adds noise to the true answer of a query to generate a noisy response. As a result, the information about the dataset leaked by the noisy output is bounded by the privacy parameter. Oftentimes, a dataset needs to be used for answering multiple queries (e.g., for multiple analytics tasks), so the level of privacy protection may degrade as more queries are answered. Thus, it is crucial to keep track of the privacy spending which should not exceed the given privacy budget. Moreover, if a query has been answered before and is asked again on the same dataset, we may reuse the previous noisy response for the current query to save the privacy cost. In view of the above, we design and implement a blockchain-based system for tracking and saving differential-privacy cost. Blockchain provides a distributed immutable ledger that records each query's type, the noisy response used to answer each query, the associated noise level added to the true query result, and the remaining privacy budget in our system. Furthermore, since the blockchain records the noisy response used to answer each query, we also design an algorithm to reuse previous noisy response if the same query is asked repeatedly. Specifically, considering that different requests of the same query may have different privacy requirements, our algorithm (via a rigorous proof) is able to set the optimal reuse fraction of the old noisy response and add new noise (if necessary) to minimize the accumulated privacy cost. Experimental results show that the proposed algorithm can reduce the privacy cost significantly without compromising data accuracy.