基于信息的基于信息的国家控制，用于内在动机的强化学习

论文标题

基于信息的基于信息的国家控制，用于内在动机的强化学习

Mutual Information-based State-Control for Intrinsically Motivated Reinforcement Learning

论文作者

Zhao, Rui, Gao, Yang, Abbeel, Pieter, Tresp, Volker, Xu, Wei

论文摘要

在加强学习中，代理商学会通过外部奖励信号实现一组目标。在自然世界中，智能生物体从内部驱动器中学习，绕开了对外部信号的需求，这对各种任务都是有益的。在这一观察结果的推动下，我们建议将固有的目标作为目标状态与可控状态之间的相互信息。该目标鼓励代理人控制其环境。随后，我们得出了提出的奖励函数的替代目标，可以有效地优化。最后，我们在不同的机器人操作和导航任务中评估了开发的框架，并证明了我们方法的功效。一个显示实验结果的视频可在https://youtu.be/ct4ckmwbyz0上获得

In reinforcement learning, an agent learns to reach a set of goals by means of an external reward signal. In the natural world, intelligent organisms learn from internal drives, bypassing the need for external signals, which is beneficial for a wide range of tasks. Motivated by this observation, we propose to formulate an intrinsic objective as the mutual information between the goal states and the controllable states. This objective encourages the agent to take control of its environment. Subsequently, we derive a surrogate objective of the proposed reward function, which can be optimized efficiently. Lastly, we evaluate the developed framework in different robotic manipulation and navigation tasks and demonstrate the efficacy of our approach. A video showing experimental results is available at https://youtu.be/CT4CKMWBYz0

下载PDF全文

下载文献需遵守相关版权规定

论文标题