论文标题

人口估计的熵正规化

Entropy Regularization for Population Estimation

论文作者

Chugg, Ben, Henderson, Peter, Goldin, Jacob, Ho, Daniel E.

论文摘要

已知熵正规化可改善在顺序决策问题中的探索。我们表明,这种相同的机制也可以导致在优化和估计的结构匪徒设置中的平均奖励几乎无偏见和较低的差异估计。最近已证明平均奖励估计(即人口估计)任务对于法律限制通常需要精确估计人口指标的公共政策环境至关重要。我们表明,利用熵和KL差异可以比现有基线在奖励和估计器方差之间产生更好的权衡,同时保持几乎没有偏见。熵正则化的这些特性说明了桥接最佳探索和估计文献的令人兴奋的潜力。

Entropy regularization is known to improve exploration in sequential decision-making problems. We show that this same mechanism can also lead to nearly unbiased and lower-variance estimates of the mean reward in the optimize-and-estimate structured bandit setting. Mean reward estimation (i.e., population estimation) tasks have recently been shown to be essential for public policy settings where legal constraints often require precise estimates of population metrics. We show that leveraging entropy and KL divergence can yield a better trade-off between reward and estimator variance than existing baselines, all while remaining nearly unbiased. These properties of entropy regularization illustrate an exciting potential for bridging the optimal exploration and estimation literatures.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源