在人类机器人环境中，通过交互式反馈进行深入的强化学习

论文标题

在人类机器人环境中，通过交互式反馈进行深入的强化学习

Deep Reinforcement Learning with Interactive Feedback in a Human-Robot Environment

论文作者

Moreira, Ithan, Rivas, Javier, Cruz, Francisco, Dazeley, Richard, Ayala, Angel, Fernandes, Bruno

论文摘要

机器人每天都在国内环境中扩展他们的存在，更常见的是他们在家庭场景中执行任务。将来，预计机器人将越来越多地执行更复杂的任务，因此能够尽快从不同来源获得经验。解决此问题的合理方法是互动反馈，培训师建议学习者应采取从特定状态采取行动以加快学习过程的方法。此外，最近在机器人技术中广泛使用了深入的强化学习来学习环境并自主获取新技能。但是，使用深度强化学习时的一个开放问题是从原始输入图像中学习任务所需的时间。在这项工作中，我们提出了一种深入的强化学习方法，并在人类机器人方案中学习了一项家庭任务。我们使用模拟机器人组来比较三种不同的学习方法，以组织不同的对象。提出的方法是（i）深钢筋学习（DEEPRL）；（ii）使用先前训练的人工代理作为顾问（Agent-Ideeprl）的互动深度强化学习；（iii）使用人类顾问（Human-Ideeprl）进行互动深度强化学习。我们证明了交互式方法为学习过程提供了优势。获得的结果表明，使用Agent-Ideeprl或Human-Ideeprl的学习者较早地完成了给定的任务，并且与自主DEEPRL方法相比，错误的错误较少。

Robots are extending their presence in domestic environments every day, being more common to see them carrying out tasks in home scenarios. In the future, robots are expected to increasingly perform more complex tasks and, therefore, be able to acquire experience from different sources as quickly as possible. A plausible approach to address this issue is interactive feedback, where a trainer advises a learner on which actions should be taken from specific states to speed up the learning process. Moreover, deep reinforcement learning has been recently widely utilized in robotics to learn the environment and acquire new skills autonomously. However, an open issue when using deep reinforcement learning is the excessive time needed to learn a task from raw input images. In this work, we propose a deep reinforcement learning approach with interactive feedback to learn a domestic task in a human-robot scenario. We compare three different learning methods using a simulated robotic arm for the task of organizing different objects; the proposed methods are (i) deep reinforcement learning (DeepRL); (ii) interactive deep reinforcement learning using a previously trained artificial agent as an advisor (agent-IDeepRL); and (iii) interactive deep reinforcement learning using a human advisor (human-IDeepRL). We demonstrate that interactive approaches provide advantages for the learning process. The obtained results show that a learner agent, using either agent-IDeepRL or human-IDeepRL, completes the given task earlier and has fewer mistakes compared to the autonomous DeepRL approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题