近：机器人控制的非明确行动加强学习

论文标题

近：机器人控制的非明确行动加强学习

NEARL: Non-Explicit Action Reinforcement Learning for Robotic Control

论文作者

Lin, Nan, Li, Yuxuan, Zhu, Yujun, Wang, Ruolin, Zhang, Xiayu, Ji, Jianmin, Tang, Keke, Chen, Xiaoping, Zhang, Xinming

论文摘要

传统上，强化学习方法根据当前状态预测下一个动作。但是，在许多情况下，直接将动作应用于控制系统或机器人很危险，并且可能导致意外行为，因为动作相当低。在本文中，我们提出了一个新型的分层增强学习框架，而无需明确的行动。我们的元政策试图操纵下一个最佳状态，而实际动作是由逆动力学模型产生的。为了稳定培训过程，我们将对抗性学习和信息瓶颈整合到我们的框架中。在我们的框架下，可以有效利用广泛可用的唯一的示范来模仿学习。同样，先验知识和约束可以应用于元政策。我们在模拟任务中测试算法及其与模仿学习的结合。实验结果显示了我们算法的可靠性和鲁棒性。

Traditionally, reinforcement learning methods predict the next action based on the current state. However, in many situations, directly applying actions to control systems or robots is dangerous and may lead to unexpected behaviors because action is rather low-level. In this paper, we propose a novel hierarchical reinforcement learning framework without explicit action. Our meta policy tries to manipulate the next optimal state and actual action is produced by the inverse dynamics model. To stabilize the training process, we integrate adversarial learning and information bottleneck into our framework. Under our framework, widely available state-only demonstrations can be exploited effectively for imitation learning. Also, prior knowledge and constraints can be applied to meta policy. We test our algorithm in simulation tasks and its combination with imitation learning. The experimental results show the reliability and robustness of our algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题