对现实世界的二进制奖励的高速加速度增强学习

论文标题

对现实世界的二进制奖励的高速加速度增强学习

High Acceleration Reinforcement Learning for Real-World Juggling with Binary Rewards

论文作者

Ploeger, Kai, Lutter, Michael, Peters, Jan

论文摘要

可以在物理世界中学习的机器人对于启用其僵硬和预编程的动作而言至关重要。对于动态的高加速任务，例如杂耍，在现实世界中学习尤其具有挑战性，因为人们必须在不损害系统的情况下按照机器人的限制及其驱动限制，从而扩大了机器人学习算法的样本效率和安全性的必要性。与主要关注学习算法的先前工作相反，我们提出了一个学习系统，该系统将这些要求直接纳入了策略表示，初始化和优化的设计中。我们证明，该系统使高速Barrett WAM操纵器能够从56分钟的经验中学习两个球，并使用二进制奖励信号。最终的政策连续杂耍长达33分钟或大约4500次重复捕获。记录学习过程和评估的视频可以在https://sites.google.com/view/jugglingbot上找到

Robots that can learn in the physical world will be important to en-able robots to escape their stiff and pre-programmed movements. For dynamic high-acceleration tasks, such as juggling, learning in the real-world is particularly challenging as one must push the limits of the robot and its actuation without harming the system, amplifying the necessity of sample efficiency and safety for robot learning algorithms. In contrast to prior work which mainly focuses on the learning algorithm, we propose a learning system, that directly incorporates these requirements in the design of the policy representation, initialization, and optimization. We demonstrate that this system enables the high-speed Barrett WAM manipulator to learn juggling two balls from 56 minutes of experience with a binary reward signal. The final policy juggles continuously for up to 33 minutes or about 4500 repeated catches. The videos documenting the learning process and the evaluation can be found at https://sites.google.com/view/jugglingbot

下载PDF全文

下载文献需遵守相关版权规定

论文标题