论文标题
迈向核心技能的一般和自主学习:运动案例研究
Towards General and Autonomous Learning of Core Skills: A Case Study in Locomotion
论文作者
论文摘要
现代增强学习(RL)算法有望直接从原始感觉输入中解决困难的运动控制问题。它们的吸引力部分是由于它们可以代表一般的方法,即使在人类专家方面很困难或昂贵的情况下,也可以学习具有合理设定的奖励和最少的先验知识的解决方案。但是,为了使RL真正善待这一诺言,我们需要算法和学习设置,这些设置可以在各种问题上起作用,而最小的问题特定调整或工程。在本文中,我们研究了运动领域中的普遍性。我们开发了一个学习框架,该框架可以学习各种腿部机器人的复杂运动行为,例如双子,绊倒,四足动物和六角形,包括车轮变体。我们的学习框架依赖于数据效率的,超政策的多任务RL算法和一小部分奖励功能,这些奖励功能在语言上是相同的。为了强调该方法的一般适用性,我们将超参数设置和奖励定义保持在实验中不变,并仅依赖于板载感应。对于包括现实世界四倍的机器人在内的九种不同类型的机器人,我们证明了同样的算法可以迅速学习多样化和可重复使用的运动技能,而无需任何平台的特定调整或学习设置的其他仪器。
Modern Reinforcement Learning (RL) algorithms promise to solve difficult motor control problems directly from raw sensory inputs. Their attraction is due in part to the fact that they can represent a general class of methods that allow to learn a solution with a reasonably set reward and minimal prior knowledge, even in situations where it is difficult or expensive for a human expert. For RL to truly make good on this promise, however, we need algorithms and learning setups that can work across a broad range of problems with minimal problem specific adjustments or engineering. In this paper, we study this idea of generality in the locomotion domain. We develop a learning framework that can learn sophisticated locomotion behavior for a wide spectrum of legged robots, such as bipeds, tripeds, quadrupeds and hexapods, including wheeled variants. Our learning framework relies on a data-efficient, off-policy multi-task RL algorithm and a small set of reward functions that are semantically identical across robots. To underline the general applicability of the method, we keep the hyper-parameter settings and reward definitions constant across experiments and rely exclusively on on-board sensing. For nine different types of robots, including a real-world quadruped robot, we demonstrate that the same algorithm can rapidly learn diverse and reusable locomotion skills without any platform specific adjustments or additional instrumentation of the learning setup.