论文标题
胰腺立体辐射疗法的可解释的计划机器人
An interpretable planning bot for pancreas stereotactic body radiation therapy
论文作者
论文摘要
胰腺立体定向身体放射治疗计划要求计划者与治疗计划系统(TPS)进行顺序,耗时的相互作用,以达到最佳剂量分布。我们寻求开发基于加强学习的计划机器人,以系统地解决复杂的权衡问题,并始终如一,有效地实现高计划的质量。胰腺SBRT规划的重点是在风险较高和计划目标量(PTV)覆盖范围之间找到平衡。计划者评估剂量分布并进行计划调整,以优化PTV覆盖范围,同时遵守OAR剂量约束。我们已经将计划者和TPS之间的这种相互作用提出了有限的Horizon RL模型。首先,根据人类规划师的经验评估计划状态特征,并将其定义为规划状态。其次,规划措施被定义为代表计划者通常实施以满足不同计划需求的步骤。最后,我们基于由医师分配的约束指导的目标函数得出了一个奖励系统。该计划机器人对16名先前治疗的患者的48个计划进行了培训,并在单独的验证集中为24例生成了计划。与临床计划相比,所有24个机器人生成的计划都获得了类似的PTV覆盖范围,同时满足了所有临床计划限制。此外,可以将机器人学到的知识可视化和解释为与人类规划知识一致的知识,并且在单独的培训课程中学到的知识图是一致的,这表明学习过程的可重复性。
Pancreas stereotactic body radiotherapy treatment planning requires planners to make sequential, time consuming interactions with the treatment planning system (TPS) to reach the optimal dose distribution. We seek to develop a reinforcement learning (RL)-based planning bot to systematically address complex tradeoffs and achieve high plan quality consistently and efficiently. The focus of pancreas SBRT planning is finding a balance between organs-at-risk sparing and planning target volume (PTV) coverage. Planners evaluate dose distributions and make planning adjustments to optimize PTV coverage while adhering to OAR dose constraints. We have formulated such interactions between the planner and the TPS into a finite-horizon RL model. First, planning status features are evaluated based on human planner experience and defined as planning states. Second, planning actions are defined to represent steps that planners would commonly implement to address different planning needs. Finally, we have derived a reward system based on an objective function guided by physician-assigned constraints. The planning bot trained itself with 48 plans augmented from 16 previously treated patients and generated plans for 24 cases in a separate validation set. All 24 bot-generated plans achieve similar PTV coverages compared to clinical plans while satisfying all clinical planning constraints. Moreover, the knowledge learned by the bot can be visualized and interpreted as consistent with human planning knowledge, and the knowledge maps learned in separate training sessions are consistent, indicating reproducibility of the learning process.