低维状态表示学习，并带有奖励形状先验

论文标题

低维状态表示学习，并带有奖励形状先验

Low Dimensional State Representation Learning with Reward-shaped Priors

论文作者

Botteghi, Nicolò, Obbink, Ruben, Geijs, Daan, Poel, Mannes, Sirmacek, Beril, Brune, Christoph, Mersha, Abeje, Stramigioli, Stefano

论文摘要

强化学习能够解决许多复杂的机器人技术任务，而无需以端到端的方式进行功能工程。但是，直接从感官输入（即观察值）直接学习最佳策略通常需要处理和存储大量数据。在机器人技术的背景下，实际机器人硬件的数据成本通常很高，因此需要实现高样本效率的解决方案。我们提出了一种旨在从观测值中学习映射到较低维状态空间的方法。该映射是通过使用损失功能的无监督学习来学习的，以纳入环境和任务的先验知识。使用来自状态空间的样本，最佳策略可以快速有效地学习。我们在模拟环境中的几个移动机器人导航任务上以及实际机器人上测试了该方法。

Reinforcement Learning has been able to solve many complicated robotics tasks without any need for feature engineering in an end-to-end fashion. However, learning the optimal policy directly from the sensory inputs, i.e the observations, often requires processing and storage of a huge amount of data. In the context of robotics, the cost of data from real robotics hardware is usually very high, thus solutions that achieve high sample-efficiency are needed. We propose a method that aims at learning a mapping from the observations into a lower-dimensional state space. This mapping is learned with unsupervised learning using loss functions shaped to incorporate prior knowledge of the environment and the task. Using the samples from the state space, the optimal policy is quickly and efficiently learned. We test the method on several mobile robot navigation tasks in a simulation environment and also on a real robot.

下载PDF全文

下载文献需遵守相关版权规定

论文标题