利用本地和远程计算机的基于视觉机器人技术的实时加强学习

论文标题

利用本地和远程计算机的基于视觉机器人技术的实时加强学习

Real-Time Reinforcement Learning for Vision-Based Robotics Utilizing Local and Remote Computers

论文作者

Wang, Yan, Vasan, Gautham, Mahmood, A. Rupam

论文摘要

实时学习对于适应不断变化的非平稳环境的机器人试剂至关重要。机器人代理的常见设置是同时拥有两台不同的计算机：一台由资源有限的本地计算机束缚在机器人上，并无线连接了强大的远程计算机。鉴于这样的设置，尚不清楚学习系统的性能在多大程度上受到资源限制的影响以及如何有效地使用无线连接的强大计算机来补偿任何性能损失。在本文中，我们实施了一个称为远程分布式（Relod）系统的实时学习系统，以在本地和远程计算机之间分发两种深钢筋学习（RL）算法，软actr-Critic（SAC）和近端政策优化（PPO）的计算。该系统的性能是在使用机器人组和移动机器人开发的两个基于视觉的控制任务上评估的。我们的结果表明，SAC的性能在资源有限的本地计算机上大大降低。令人惊讶的是，当将学习系统的所有计算部署在远程工作站上时，SAC无法弥补性能损失，表明在不仔细考虑的情况下，使用强大的远程计算机可能不会导致性能改善。但是，精心选择的SAC计算分布始终如一，并实质上改善了其在这两个任务上的性能。另一方面，PPO的性能在很大程度上不受计算分布的影响。此外，当所有计算仅在功能强大的计算机上进行时，我们系统的性能仍与现有系统相当，该系统对使用一台计算机进行了良好的调整。 REROD是用于实时RL的唯一公开可用系统，可用于多个机器人用于基于视觉的任务。

Real-time learning is crucial for robotic agents adapting to ever-changing, non-stationary environments. A common setup for a robotic agent is to have two different computers simultaneously: a resource-limited local computer tethered to the robot and a powerful remote computer connected wirelessly. Given such a setup, it is unclear to what extent the performance of a learning system can be affected by resource limitations and how to efficiently use the wirelessly connected powerful computer to compensate for any performance loss. In this paper, we implement a real-time learning system called the Remote-Local Distributed (ReLoD) system to distribute computations of two deep reinforcement learning (RL) algorithms, Soft Actor-Critic (SAC) and Proximal Policy Optimization (PPO), between a local and a remote computer. The performance of the system is evaluated on two vision-based control tasks developed using a robotic arm and a mobile robot. Our results show that SAC's performance degrades heavily on a resource-limited local computer. Strikingly, when all computations of the learning system are deployed on a remote workstation, SAC fails to compensate for the performance loss, indicating that, without careful consideration, using a powerful remote computer may not result in performance improvement. However, a carefully chosen distribution of computations of SAC consistently and substantially improves its performance on both tasks. On the other hand, the performance of PPO remains largely unaffected by the distribution of computations. In addition, when all computations happen solely on a powerful tethered computer, the performance of our system remains on par with an existing system that is well-tuned for using a single machine. ReLoD is the only publicly available system for real-time RL that applies to multiple robots for vision-based tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题