通过有效的主动功能获取的强化学习

论文标题

通过有效的主动功能获取的强化学习

Reinforcement Learning with Efficient Active Feature Acquisition

论文作者

Yin, Haiyan, Li, Yingzhen, Pan, Sinno Jialin, Zhang, Cheng, Tschiatschek, Sebastian

论文摘要

在部分可观察性下解决现实生活中的顺序决策问题涉及探索探索问题。为了取得成功，代理需要有效地收集有关世界状况的有价值的信息，以做出有益的决定。但是，在现实生活中，获取有价值的信息通常是高昂的成本，例如，在医疗领域中，信息获取可能对应于对患者进行医疗测试。这给代理商构成了一个重大挑战，即在减少信息获取成本的同时最佳地执行任务。在本文中，我们提出了一个基于模型的增强学习框架，该框架学习了一个主动的功能采集政策，以解决其执行过程中探索探索问题。成功的关键是一种新颖的顺序变异自动编码器，该编码器从部分观察到的状态中学习高质量的表示，然后该政策将其用于以具有成本效益的方式最大化任务奖励。我们证明了我们提出的框架在控制域以及使用医疗模拟器中的功效。在这两项任务中，我们提出的方法的表现都优于常规基线，并产生具有更高成本效率的政策。

Solving real-life sequential decision making problems under partial observability involves an exploration-exploitation problem. To be successful, an agent needs to efficiently gather valuable information about the state of the world for making rewarding decisions. However, in real-life, acquiring valuable information is often highly costly, e.g., in the medical domain, information acquisition might correspond to performing a medical test on a patient. This poses a significant challenge for the agent to perform optimally for the task while reducing the cost for information acquisition. In this paper, we propose a model-based reinforcement learning framework that learns an active feature acquisition policy to solve the exploration-exploitation problem during its execution. Key to the success is a novel sequential variational auto-encoder that learns high-quality representations from partially observed states, which are then used by the policy to maximize the task reward in a cost efficient manner. We demonstrate the efficacy of our proposed framework in a control domain as well as using a medical simulator. In both tasks, our proposed method outperforms conventional baselines and results in policies with greater cost efficiency.

下载PDF全文

下载文献需遵守相关版权规定

论文标题