论文标题

部分可观测时空混沌系统的无模型预测

Adversarial Motion Priors Make Good Substitutes for Complex Reward Functions

论文作者

Escontrela, Alejandro, Peng, Xue Bin, Yu, Wenhao, Zhang, Tingnan, Iscen, Atil, Goldberg, Ken, Abbeel, Pieter

论文摘要

培训具有未指定奖励功能的高维模拟代理通常会导致代理商学习在现实世界中部署时无效的身体上不可行的策略。为了减轻这些不自然的行为,加强学习从业者经常利用鼓励身体上合理的行为的复杂奖励功能。但是,通常需要一个乏味的劳动密集型调整过程来创建手工设计的奖励,这可能不容易在平台和任务上概括。我们建议使用从运动捕获演示的数据集中学到的“样式奖励”来替换复杂的奖励功能。博学的样式奖励可以与使用自然主义策略执行任务的培训政策的任意任务奖励相结合。这些自然策略也可以促进转移到现实世界。我们以对抗性运动先验为基础,这是一种来自计算机图形域的方法,该方法编码了参考动作数据集的样式奖励,以证明训练策略的对抗性方法可以产生转移到实际四足动物机器人而无需复杂奖励功能的行为。我们还证明,可以从从德国牧羊犬那里收集的几秒钟的运动捕获数据中学到有效的风格奖励,并通过自然步态过渡带来节能的运动策略。

Training a high-dimensional simulated agent with an under-specified reward function often leads the agent to learn physically infeasible strategies that are ineffective when deployed in the real world. To mitigate these unnatural behaviors, reinforcement learning practitioners often utilize complex reward functions that encourage physically plausible behaviors. However, a tedious labor-intensive tuning process is often required to create hand-designed rewards which might not easily generalize across platforms and tasks. We propose substituting complex reward functions with "style rewards" learned from a dataset of motion capture demonstrations. A learned style reward can be combined with an arbitrary task reward to train policies that perform tasks using naturalistic strategies. These natural strategies can also facilitate transfer to the real world. We build upon Adversarial Motion Priors -- an approach from the computer graphics domain that encodes a style reward from a dataset of reference motions -- to demonstrate that an adversarial approach to training policies can produce behaviors that transfer to a real quadrupedal robot without requiring complex reward functions. We also demonstrate that an effective style reward can be learned from a few seconds of motion capture data gathered from a German Shepherd and leads to energy-efficient locomotion strategies with natural gait transitions.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源