与竞争代理商一起学习的政策

论文标题

与竞争代理商一起学习的政策

Policy Learning with Competing Agents

论文作者

Sahoo, Roshni, Wager, Stefan

论文摘要

决策者通常旨在根据对他们可以治疗的代理商数量的能力限制来学习治疗任务政策。当代理商能够对此类政策进行战略性做出反应时，就会发生竞争，从而使对最佳政策的估计变得复杂。在本文中，我们研究了这种干扰存在的容量受限的治疗分配。我们考虑了一个动态模型，决策者在每个时间步骤中分配治疗方法，而异质的代理在近视上最适合先前的治疗作业政策。当代理数量较大但有限时，我们表明，根据给定政策，接受治疗的阈值会收敛于该政策的平均场均衡阈值。基于此结果，我们为策略梯度开发了一致的估计器。在1988年国家教育纵向研究的数据中，我们证明，在存在战略行为的情况下，该估计值可用于学习能力约束的政策。

Decision makers often aim to learn a treatment assignment policy under a capacity constraint on the number of agents that they can treat. When agents can respond strategically to such policies, competition arises, complicating estimation of the optimal policy. In this paper, we study capacity-constrained treatment assignment in the presence of such interference. We consider a dynamic model where the decision maker allocates treatments at each time step and heterogeneous agents myopically best respond to the previous treatment assignment policy. When the number of agents is large but finite, we show that the threshold for receiving treatment under a given policy converges to the policy's mean-field equilibrium threshold. Based on this result, we develop a consistent estimator for the policy gradient. In a semi-synthetic experiment with data from the National Education Longitudinal Study of 1988, we demonstrate that this estimator can be used for learning capacity-constrained policies in the presence of strategic behavior.

下载PDF全文

下载文献需遵守相关版权规定

论文标题