盲目决策：延迟观察的强化学习

论文标题

盲目决策：延迟观察的强化学习

Blind Decision Making: Reinforcement Learning with Delayed Observations

论文作者

Agarwal, Mridul, Aggarwal, Vaneet

论文摘要

加强学习通常假定状态从先前的动作中更新，因此可以用于做出未来的决策。但是，这可能并不总是正确的。当状态更新不可用时，做出的决定部分在盲人中，因为它不能依赖当前状态信息。本文提出了一种方法，可以使用国家知识的延迟，并根据可能不包括当前状态信息的可用信息做出决定。但是，一种方法是将最后一个已知状态之后的动作包括作为状态信息的一部分，这会导致状态空间增加，从而使问题复杂而趋于收敛性。与状态更新没有延迟的情况相比，所提出的算法提供了一种未扩大状态空间的替代方法。对基本RL环境的评估进一步说明了提出的算法的性能提高。

Reinforcement learning typically assumes that the state update from the previous actions happens instantaneously, and thus can be used for making future decisions. However, this may not always be true. When the state update is not available, the decision taken is partly in the blind since it cannot rely on the current state information. This paper proposes an approach, where the delay in the knowledge of the state can be used, and the decisions are made based on the available information which may not include the current state information. One approach could be to include the actions after the last-known state as a part of the state information, however, that leads to an increased state-space making the problem complex and slower in convergence. The proposed algorithm gives an alternate approach where the state space is not enlarged, as compared to the case when there is no delay in the state update. Evaluations on the basic RL environments further illustrate the improved performance of the proposed algorithm.

下载PDF全文

下载文献需遵守相关版权规定

论文标题