重新思考贵族：它真的可以提高性能吗？

论文标题

重新思考贵族：它真的可以提高性能吗？

Rethinking ValueDice: Does It Really Improve Performance?

论文作者

Li, Ziniu, Xu, Tian, Yu, Yang, Luo, Zhi-Quan

论文摘要

自盖尔引入以来，对抗性模仿学习（AIL）方法吸引了许多研究兴趣。在这些方法中，Proitedice取得了重大改进：它在离线设置下击败了经典的方法行为克隆（BC），并且在在线环境下所需的互动少于Gail。这些改进是否从更先进的算法设计中受益？我们通过以下结论回答这个问题。首先，我们表明，在离线设置下，Prodedice可以减少到BC。其次，我们验证在低数据制度中是否存在过度拟合，正规化事项。具体来说，我们证明，随着重量衰减，卑诗省也几乎像贵重行动一样与专家表现相匹配。前两个主张解释了Prodedice的出色离线表现。第三，我们确定当专家轨迹被亚采样时，贵重行为不起作用。取而代之的是，当专家轨迹完整时，“贵重之处”的成功所保持的成功，其中贵重行为与卑诗省的性能密切相关。最后，我们讨论了我们的研究对模仿学习研究的含义。

Since the introduction of GAIL, adversarial imitation learning (AIL) methods attract lots of research interests. Among these methods, ValueDice has achieved significant improvements: it beats the classical approach Behavioral Cloning (BC) under the offline setting, and it requires fewer interactions than GAIL under the online setting. Are these improvements benefited from more advanced algorithm designs? We answer this question by the following conclusions. First, we show that ValueDice could reduce to BC under the offline setting. Second, we verify that overfitting exists and regularization matters in the low-data regime. Specifically, we demonstrate that with weight decay, BC also nearly matches the expert performance as ValueDice does. The first two claims explain the superior offline performance of ValueDice. Third, we establish that ValueDice does not work when the expert trajectory is subsampled. Instead, the mentioned success of ValueDice holds when the expert trajectory is complete, in which ValueDice is closely related to BC that performs well as mentioned. Finally, we discuss the implications of our research for imitation learning studies beyond ValueDice.

下载PDF全文

下载文献需遵守相关版权规定

论文标题