使用选择性注意力对非增强偏好进行建模

论文标题

使用选择性注意力对非增强偏好进行建模

Modelling non-reinforced preferences using selective attention

论文作者

Sajid, Noor, Tigas, Panagiotis, Fountas, Zafeirios, Guo, Qinghai, Zakharov, Alexey, Da Costa, Lancelot

论文摘要

人工代理如何学习非强化的偏好以将其行为不断地适应不断变化的环境？我们将这个问题分解为两个挑战：（$ i $）编码各种记忆，（$ ii $）有选择地参与这些记忆以进行偏好形成。我们提出的\ emph {no} n- \ emph {re}使用选择性注意的\ textsc {nore}的偏好学习机制，通过利用代理商的世界模型来收集一系列经验，这些体验与想象的滚动相互交流以结构记忆。这些记忆有选择地利用注意力和门控块来更新代理的偏好。在环境的固定模型下，我们在修改的OpenAi Gym Frozenlake环境（没有任何外部信号）中验证\ textsc {nore}，并将其行为与\ textsc {pepper}（Hebbian偏好学习机制）进行比较。我们证明\ textsc {nore}提供了一个直接的框架，可以在没有外部信号的情况下引起探索性偏好。

How can artificial agents learn non-reinforced preferences to continuously adapt their behaviour to a changing environment? We decompose this question into two challenges: ($i$) encoding diverse memories and ($ii$) selectively attending to these for preference formation. Our proposed \emph{no}n-\emph{re}inforced preference learning mechanism using selective attention, \textsc{Nore}, addresses both by leveraging the agent's world model to collect a diverse set of experiences which are interleaved with imagined roll-outs to encode memories. These memories are selectively attended to, using attention and gating blocks, to update agent's preferences. We validate \textsc{Nore} in a modified OpenAI Gym FrozenLake environment (without any external signal) with and without volatility under a fixed model of the environment -- and compare its behaviour to \textsc{Pepper}, a Hebbian preference learning mechanism. We demonstrate that \textsc{Nore} provides a straightforward framework to induce exploratory preferences in the absence of external signals.

下载PDF全文

下载文献需遵守相关版权规定

论文标题