广义增强学习：经验粒子，行动操作员，加强领域，记忆关联和决策概念

论文标题

广义增强学习：经验粒子，行动操作员，加强领域，记忆关联和决策概念

Generalized Reinforcement Learning: Experience Particles, Action Operator, Reinforcement Field, Memory Association, and Decision Concepts

论文作者

Chiu, Po-Hsiang, Huber, Manfred

论文摘要

学习能够适应时变和潜在发展的系统动态的控制策略对主流增强学习（RL）是一个巨大的挑战。主要是，不断变化的系统属性将不断影响RL代理如何通过其行动与状态空间相互作用，这有效地（重新）引入了概念漂移到基础政策学习过程。我们假设，可以通过表征和表示具有额外“自由度”的动作来实现更高的控制策略，从而具有更大的灵活性，可以适应该动作的“行为”结果的变化，包括这些行动如何实时进行以及动作集中的变化。本文通过首先建立参数作用模型的概念来更好地应对不确定性和流体动作行为，然后将贝叶斯味的广义RL框架提出，然后将强化领域的概念作为物理学启发的构建体引入通过“极性化体验颗粒”中的物理启发的构建体中，从而在RL代理人的工作记忆中维持。这些粒子有效地编码了代理人的动态学习体验，该体验会以自组织的方式随着时间的流逝而演变。我们将使用强化字段作为衬底，我们将通过将过去的记忆视为隐式图形结构来进一步概括策略搜索，以将记忆实例或粒子与其定义和量化的相似程度相互联系，在该结构中互连，以建立和量化“关联”来建立和量化代理人的模型。

Learning a control policy capable of adapting to time-varying and potentially evolving system dynamics has been a great challenge to the mainstream reinforcement learning (RL). Mainly, the ever-changing system properties would continuously affect how the RL agent interacts with the state space through its actions, which effectively (re-)introduces concept drifts to the underlying policy learning process. We postulated that higher adaptability for the control policy can be achieved by characterizing and representing actions with extra "degrees of freedom" and thereby, with greater flexibility, adjusts to variations from the action's "behavioral" outcomes, including how these actions get carried out in real time and the shift in the action set itself. This paper proposes a Bayesian-flavored generalized RL framework by first establishing the notion of parametric action model to better cope with uncertainty and fluid action behaviors, followed by introducing the notion of reinforcement field as a physics-inspired construct established through "polarized experience particles" maintained in the RL agent's working memory. These particles effectively encode the agent's dynamic learning experience that evolves over time in a self-organizing way. Using the reinforcement field as a substrate, we will further generalize the policy search to incorporate high-level decision concepts by viewing the past memory as an implicit graph structure, in which the memory instances, or particles, are interconnected with their degrees of associability/similarity defined and quantified such that the "associative memory" principle can be consistently applied to establish and augment the learning agent's evolving world model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题