ASHA：通过人类在循环增强学习的辅助远程运行

论文标题

ASHA：通过人类在循环增强学习的辅助远程运行

ASHA: Assistive Teleoperation via Human-in-the-Loop Reinforcement Learning

论文作者

Chen, Sean, Gao, Jensen, Reddy, Siddharth, Berseth, Glen, Dragan, Anca D., Levine, Sergey

论文摘要

通过任意，高维，嘈杂的输入（例如，眼睛凝视的网络摄像头图像）来控制机器人的辅助接口可能具有挑战性，尤其是在没有自然“默认”界面的情况下推断用户所需的动作时。从在线用户反馈中对系统性能的增强学习为此问题提供了自然解决方案，并使界面能够适应单个用户。但是，这种方法倾向于需要大量的人类训练数据，尤其是当反馈很少时。我们提出了一种分层解决方案，该解决方案从稀疏的用户反馈中有效地学习：我们使用离线预培训来获取有用的高级机器人行为的潜在嵌入空间，这反过来又使系统能够专注于使用在线用户的反馈来从用户输入中学习所需的高级行为。关键的见解是，访问预先训练的策略使系统能够从稀疏的回报中学习更多的知识，而不是幼稚的RL算法：使用预训练的策略，该系统可以用成功的任务执行来证明，以证明用户在不成功的执行过程中实际上要做什么。我们主要通过用户研究评估我们的方法，他们使用网络摄像头和他们的眼睛凝视在三个模拟机器人操纵域执行任务：翻转灯开关，打开架子门以到达内部的物体并旋转阀门。结果表明，我们的方法成功学会了将128维注视功能映射到在10分钟的在线培训中稀疏奖励的7维关节扭矩，并无缝地帮助那些使用不同凝视策略的用户，同时适应网络摄像头输入，任务，任务和环境的分配变化。

Building assistive interfaces for controlling robots through arbitrary, high-dimensional, noisy inputs (e.g., webcam images of eye gaze) can be challenging, especially when it involves inferring the user's desired action in the absence of a natural 'default' interface. Reinforcement learning from online user feedback on the system's performance presents a natural solution to this problem, and enables the interface to adapt to individual users. However, this approach tends to require a large amount of human-in-the-loop training data, especially when feedback is sparse. We propose a hierarchical solution that learns efficiently from sparse user feedback: we use offline pre-training to acquire a latent embedding space of useful, high-level robot behaviors, which, in turn, enables the system to focus on using online user feedback to learn a mapping from user inputs to desired high-level behaviors. The key insight is that access to a pre-trained policy enables the system to learn more from sparse rewards than a naïve RL algorithm: using the pre-trained policy, the system can make use of successful task executions to relabel, in hindsight, what the user actually meant to do during unsuccessful executions. We evaluate our method primarily through a user study with 12 participants who perform tasks in three simulated robotic manipulation domains using a webcam and their eye gaze: flipping light switches, opening a shelf door to reach objects inside, and rotating a valve. The results show that our method successfully learns to map 128-dimensional gaze features to 7-dimensional joint torques from sparse rewards in under 10 minutes of online training, and seamlessly helps users who employ different gaze strategies, while adapting to distributional shift in webcam inputs, tasks, and environments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题