论文标题

半私人计算数据相似性与应用程序评估和定价的应用

Semi-Private Computation of Data Similarity with Applications to Data Valuation and Pricing

论文作者

Christensen, René Bødker, Pandey, Shashi Raj, Popovski, Petar

论文摘要

考虑两个希望为某个学习模型贡献数据的数据提供商。最近的作品表明,一个提供商之一的数据值取决于与其他提供商拥有的数据的相似性。因此,如果两个提供商能够计算数据的相似性,同时将实际数据私有化,那么这将是有益的。在这项工作中,我们设计了多方计算协议,以根据相关性计算两个数据集的相似性,同时提供可控的隐私保证。我们考虑了一个具有两个参与提供者的简单模型,并开发了分别通过受控信息泄漏来计算精确和近似相关的方法。两种协议都具有在数据样本数量中线性线性的计算和通信复杂性。我们还为近似情况下的最大误差提供了一般界限,并分析了实际参数选择的结果误差。

Consider two data providers that want to contribute data to a certain learning model. Recent works have shown that the value of the data of one of the providers is dependent on the similarity with the data owned by the other provider. It would thus be beneficial if the two providers can calculate the similarity of their data, while keeping the actual data private. In this work, we devise multiparty computation-protocols to compute similarity of two data sets based on correlation, while offering controllable privacy guarantees. We consider a simple model with two participating providers and develop methods to compute exact and approximate correlation, respectively, with controlled information leakage. Both protocols have computational and communication complexities that are linear in the number of data samples. We also provide general bounds on the maximal error in the approximation case, and analyse the resulting errors for practical parameter choices.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源