硬件作为策略：使用深厚的增强学习的机械和计算合作式化

论文标题

硬件作为策略：使用深厚的增强学习的机械和计算合作式化

Hardware as Policy: Mechanical and Computational Co-Optimization using Deep Reinforcement Learning

论文作者

Chen, Tianjian, He, Zhanpeng, Ciocarlie, Matei

论文摘要

深度强化学习（RL）在学习机器人技术中的各种应用中学习复杂的控制政策方面取得了巨大的成功。但是，在大多数情况下，机器人的硬件被认为是不变的，以环境的一部分建模。在这项研究中，我们在统一的RL框架中一起探讨了学习硬件和控制参数的问题。为了实现这一目标，我们建议将机器人主体建模为“硬件策略”，类似于与其计算对应物共同优化。我们表明，通过对诸如自动差异计算图之类的硬件策略进行建模，可以通过策略优化家族的基于梯度的算法有效地解决随后的优化问题。我们提出了两个这样的设计示例：一个玩具质量弹簧问题，以及设计不足的手的现实问题。我们将我们的方法与传统的合作方法进行了比较，并通过基于学习的硬件参数构建物理原型来证明其有效性。可以在https://roamlab.github.io/hwasp/上获得视频和更多详细信息。

Deep Reinforcement Learning (RL) has shown great success in learning complex control policies for a variety of applications in robotics. However, in most such cases, the hardware of the robot has been considered immutable, modeled as part of the environment. In this study, we explore the problem of learning hardware and control parameters together in a unified RL framework. To achieve this, we propose to model the robot body as a "hardware policy", analogous to and optimized jointly with its computational counterpart. We show that, by modeling such hardware policies as auto-differentiable computational graphs, the ensuing optimization problem can be solved efficiently by gradient-based algorithms from the Policy Optimization family. We present two such design examples: a toy mass-spring problem, and a real-world problem of designing an underactuated hand. We compare our method against traditional co-optimization approaches, and also demonstrate its effectiveness by building a physical prototype based on the learned hardware parameters. Videos and more details are available at https://roamlab.github.io/hwasp/ .

下载PDF全文

下载文献需遵守相关版权规定

论文标题