论文标题
通过非参数政策和行动原始人的强化学习基于视觉的对象操纵
Reinforcement Learning for Vision-based Object Manipulation with Non-parametric Policy and Action Primitives
论文作者
论文摘要
对象操作是服务机器人的关键能力,但是由于某些原因(例如样本效率),很难通过增强学习解决。在本文中,为了解决此对象操纵,我们提出了一个新颖的框架AP-NPQL(使用动作原始人的非参数Q学习),可以通过使用非参数策略来有效地通过视觉输入和稀疏奖励来有效地解决对象操纵,并在对象操作之前加强行为和适当的行为。我们评估了拟议的AP-NPQL在模拟中的四个对象操纵任务(推板,堆叠框,翻转杯以及采摘和放置板)的效率和性能,事实证明,我们的AP-NPQL胜过基于参数政策和学习时间和任务成功率的最先进的算法。我们还成功地将板接地任务的学识渊博的政策转移到了SIM到现实的方式。
The object manipulation is a crucial ability for a service robot, but it is hard to solve with reinforcement learning due to some reasons such as sample efficiency. In this paper, to tackle this object manipulation, we propose a novel framework, AP-NPQL (Non-Parametric Q Learning with Action Primitives), that can efficiently solve the object manipulation with visual input and sparse reward, by utilizing a non-parametric policy for reinforcement learning and appropriate behavior prior for the object manipulation. We evaluate the efficiency and the performance of the proposed AP-NPQL for four object manipulation tasks on simulation (pushing plate, stacking box, flipping cup, and picking and placing plate), and it turns out that our AP-NPQL outperforms the state-of-the-art algorithms based on parametric policy and behavior prior in terms of learning time and task success rate. We also successfully transfer and validate the learned policy of the plate pick-and-place task to the real robot in a sim-to-real manner.