机器人表网球中的样品效率增强学习

论文标题

机器人表网球中的样品效率增强学习

Sample-efficient Reinforcement Learning in Robotic Table Tennis

论文作者

Tebbe, Jonas, Krauch, Lukas, Gao, Yapeng, Zell, Andreas

论文摘要

强化学习（RL）在各种计算机游戏和模拟中取得了一些令人印象深刻的成功。这些成功中的大多数是基于代理商可以学习的大量情节。但是，在典型的机器人应用中，可行尝试的数量非常有限。在本文中，我们提出了一种应用于乒乓球机器人示例的样品效率RL算法。在乒乓球中，每个中风都有不同的位置，速度和自旋。因此，必须根据高维连续状态空间找到准确的回报。为了在几个试验中进行学习，该方法嵌入了我们的机器人系统中。这样，我们可以使用一个步骤的环境。状态空间取决于击球时间（位置，速度，旋转），而动作是击球时的球拍状态（方向，速度）。开发了一种基于参与者的确定性策略梯度算法，用于加速学习。在许多具有挑战性的情况下，我们的方法在模拟和真正的机器人中都具有竞争力。准确的结果是在不到200美元的培训情节中获得的，而无需预先培训。介绍我们实验的视频可在https://youtu.be/uratdol6wpw上找到。

Reinforcement learning (RL) has achieved some impressive recent successes in various computer games and simulations. Most of these successes are based on having large numbers of episodes from which the agent can learn. In typical robotic applications, however, the number of feasible attempts is very limited. In this paper we present a sample-efficient RL algorithm applied to the example of a table tennis robot. In table tennis every stroke is different, with varying placement, speed and spin. An accurate return therefore has to be found depending on a high-dimensional continuous state space. To make learning in few trials possible the method is embedded into our robot system. In this way we can use a one-step environment. The state space depends on the ball at hitting time (position, velocity, spin) and the action is the racket state (orientation, velocity) at hitting. An actor-critic based deterministic policy gradient algorithm was developed for accelerated learning. Our approach performs competitively both in a simulation and on the real robot in a number of challenging scenarios. Accurate results are obtained without pre-training in under $200$ episodes of training. The video presenting our experiments is available at https://youtu.be/uRAtdoL6Wpw.

下载PDF全文

下载文献需遵守相关版权规定

论文标题