论文标题
通过同行预测获取真实的数据
Truthful Data Acquisition via Peer Prediction
论文作者
论文摘要
我们考虑购买用于机器学习或统计估计数据的数据的问题。数据分析师有一个从多个数据提供商那里购买数据集的预算。她没有任何可用于评估收集数据的测试数据,并且可以仅基于收集的数据集将付款分配给数据提供商。我们考虑标准贝叶斯范式中的问题,并在两种设置中考虑:(1)仅收集一次数据; (2)反复收集数据,并且每天的数据与同一分布独立绘制。对于这两种设置,我们的机制可以通过采用同行预测的技术来确保实际报告数据集始终是一个平衡:向每个提供商支付其报告的数据与其他提供商报告的数据之间的共同信息。根据数据分布,这些机制还可以阻止错误报告,从而导致预测不准确。我们的机制还保证了第一个环境中某些基本分布的个人合理性和预算可行性以及第二个环境中的所有分布。
We consider the problem of purchasing data for machine learning or statistical estimation. The data analyst has a budget to purchase datasets from multiple data providers. She does not have any test data that can be used to evaluate the collected data and can assign payments to data providers solely based on the collected datasets. We consider the problem in the standard Bayesian paradigm and in two settings: (1) data are only collected once; (2) data are collected repeatedly and each day's data are drawn independently from the same distribution. For both settings, our mechanisms guarantee that truthfully reporting one's dataset is always an equilibrium by adopting techniques from peer prediction: pay each provider the mutual information between his reported data and other providers' reported data. Depending on the data distribution, the mechanisms can also discourage misreports that would lead to inaccurate predictions. Our mechanisms also guarantee individual rationality and budget feasibility for certain underlying distributions in the first setting and for all distributions in the second setting.