论文标题

用于数据评估的私人沙普利值

Differentially Private Shapley Values for Data Evaluation

论文作者

Watson, Lauren, Andreeva, Rayna, Yang, Hao-Tsung, Sarkar, Rik

论文摘要

Shapley值已被提议作为对机器学习中许多应用的解决方案,包括用于公平的数据估值。 Shapley值在计算上很昂贵,并且涉及整个数据集。查询点的沙普利值也可能损害其他数据点的统计隐私。我们观察到,在机器学习问题(例如经验风险最小化)以及许多学习算法(例如稳定性统一的算法)中,收益率降低,其中每个数据点的边际收益随数据样本量而迅速降低。基于此属性,我们提出了一种称为层状沙普利算法的新分层近似方法。我们证明,该方法在小(\ polylog(n))数据和小型($ o(\ log n)$)联盟的随机样本上运行,以保证的概率准确性来实现结果,并可以修改以纳入差异隐私。实验结果表明,该算法正确地标识了提高验证精度的高价值数据点,并且差异私有评估保留了数据的近似排名。

The Shapley value has been proposed as a solution to many applications in machine learning, including for equitable valuation of data. Shapley values are computationally expensive and involve the entire dataset. The query for a point's Shapley value can also compromise the statistical privacy of other data points. We observe that in machine learning problems such as empirical risk minimization, and in many learning algorithms (such as those with uniform stability), a diminishing returns property holds, where marginal benefit per data point decreases rapidly with data sample size. Based on this property, we propose a new stratified approximation method called the Layered Shapley Algorithm. We prove that this method operates on small (O(\polylog(n))) random samples of data and small sized ($O(\log n)$) coalitions to achieve the results with guaranteed probabilistic accuracy, and can be modified to incorporate differential privacy. Experimental results show that the algorithm correctly identifies high-value data points that improve validation accuracy, and that the differentially private evaluations preserve approximate ranking of data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源