文字的强化学习方法：POMDP/自适应控制方法

论文标题

文字的强化学习方法：POMDP/自适应控制方法

Reinforcement Learning Methods for Wordle: A POMDP/Adaptive Control Approach

论文作者

Bhambri, Siddhant, Bhattacharjee, Amrita, Bertsekas, Dimitri

论文摘要

在本文中，我们使用新的强化学习方法解决了流行文字拼图的解决方案，这些方法更普遍地用于自适应控制动态系统和部分可观察到的马尔可夫决策过程（POMDP）问题。这些方法基于价值空间和推出方法的近似值，承认直接实现，并在各种启发式方法上提供了改进的性能。对于Wordle难题，它们产生了在线解决方案策略，这些策略以相对适中的计算成本非常接近最佳。我们的方法对于更复杂的Wordle和相关搜索问题是可行的，对于该问题，最佳策略将无法计算。它们还适用于广泛的自适应顺序决策问题，涉及估计参数在线估计的未知或经常变化的环境。

In this paper we address the solution of the popular Wordle puzzle, using new reinforcement learning methods, which apply more generally to adaptive control of dynamic systems and to classes of Partially Observable Markov Decision Process (POMDP) problems. These methods are based on approximation in value space and the rollout approach, admit a straightforward implementation, and provide improved performance over various heuristic approaches. For the Wordle puzzle, they yield on-line solution strategies that are very close to optimal at relatively modest computational cost. Our methods are viable for more complex versions of Wordle and related search problems, for which an optimal strategy would be impossible to compute. They are also applicable to a wide range of adaptive sequential decision problems that involve an unknown or frequently changing environment whose parameters are estimated on-line.

下载PDF全文

下载文献需遵守相关版权规定

论文标题