论文标题
学习有或没有感觉反馈的时间优化的路径跟踪
Learning Time-optimized Path Tracking with or without Sensory Feedback
论文作者
论文摘要
在本文中,我们提出了一种基于学习的方法,该方法允许机器人快速遵循关节空间中定义的参考路径,而不会超过每个机器人关节的位置,速度,加速度和混蛋的限制。与时间优化路径参数化的离线方法相反,可以在运动执行过程中更改参考路径。此外,例如,我们的方法可以利用感觉反馈遵循带有双足机器人的参考路径而不会失去平衡。使用我们的方法,该机器人由通过物理模拟器生成的数据通过增强学习训练的神经网络控制。从数学角度来看,以时间优化方式跟踪参考路径的问题被形式化为马尔可夫决策过程。每个状态包括指定参考路径下一部分的固定数量的路点。该动作空间的设计方式使所有产生的动作都符合指定的运动学关节限制。奖励功能最终反映了执行时间之间的权衡,偏离所需的参考路径和可选的其他目标(例如平衡)。我们在有或没有其他目标的情况下评估我们的方法,并表明可以为工业机器人和人形机器人成功学习时间优化的路径跟踪。此外,我们证明了在模拟中训练的网络可以成功地转移到真正的机器人中。
In this paper, we present a learning-based approach that allows a robot to quickly follow a reference path defined in joint space without exceeding limits on the position, velocity, acceleration and jerk of each robot joint. Contrary to offline methods for time-optimal path parameterization, the reference path can be changed during motion execution. In addition, our approach can utilize sensory feedback, for instance, to follow a reference path with a bipedal robot without losing balance. With our method, the robot is controlled by a neural network that is trained via reinforcement learning using data generated by a physics simulator. From a mathematical perspective, the problem of tracking a reference path in a time-optimized manner is formalized as a Markov decision process. Each state includes a fixed number of waypoints specifying the next part of the reference path. The action space is designed in such a way that all resulting motions comply with the specified kinematic joint limits. The reward function finally reflects the trade-off between the execution time, the deviation from the desired reference path and optional additional objectives like balancing. We evaluate our approach with and without additional objectives and show that time-optimized path tracking can be successfully learned for both industrial and humanoid robots. In addition, we demonstrate that networks trained in simulation can be successfully transferred to a real robot.