临床医生在循环决策中：通过近乎最佳的设定价值政策进行加固学习

论文标题

临床医生在循环决策中：通过近乎最佳的设定价值政策进行加固学习

Clinician-in-the-Loop Decision Making: Reinforcement Learning with Near-Optimal Set-Valued Policies

论文作者

Tang, Shengpu, Modi, Aditya, Sjoding, Michael W., Wiens, Jenna

论文摘要

标准强化学习（RL）旨在找到一项最佳政策，以确定每个州的最佳行动。但是，在医疗保健环境中，关于奖励（例如生存），许多行动可能几乎是等于的。我们考虑了一个替代目标 - 学习设定值的政策，以捕获近乎等效的行动，从而带来类似的累积奖励。我们提出了一种基于时间差异学习的无模型算法和用于选择行动选择的近乎怪异的启发式算法。我们分析了所提出的算法的理论特性，提供了最佳保证，并证明了我们在模拟环境和真正的临床任务上的方法。从经验上讲，所提出的算法具有良好的收敛特性，并发现了有意义的近等效作用。我们的工作为临床医生/人类的决策提供了理论和实用的基础，在这些决策中，当人们在近乎等于的动作中选择时，人类（例如，临床医生，患者）可以纳入其他知识（例如，副作用，患者偏爱）。

Standard reinforcement learning (RL) aims to find an optimal policy that identifies the best action for each state. However, in healthcare settings, many actions may be near-equivalent with respect to the reward (e.g., survival). We consider an alternative objective -- learning set-valued policies to capture near-equivalent actions that lead to similar cumulative rewards. We propose a model-free algorithm based on temporal difference learning and a near-greedy heuristic for action selection. We analyze the theoretical properties of the proposed algorithm, providing optimality guarantees and demonstrate our approach on simulated environments and a real clinical task. Empirically, the proposed algorithm exhibits good convergence properties and discovers meaningful near-equivalent actions. Our work provides theoretical, as well as practical, foundations for clinician/human-in-the-loop decision making, in which humans (e.g., clinicians, patients) can incorporate additional knowledge (e.g., side effects, patient preference) when selecting among near-equivalent actions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题