通过对等蒸馏进行稳健的域随机加固学习

论文标题

通过对等蒸馏进行稳健的域随机加固学习

Robust Domain Randomised Reinforcement Learning through Peer-to-Peer Distillation

论文作者

Zhao, Chenyang, Hospedales, Timothy

论文摘要

在强化学习中，域随机化是一种越来越受欢迎的技术，用于学习更多对部署时域转移的通用政策。但是，从随机域的天真汇总信息可能会导致梯度估计和不稳定学习过程的差异。为了解决这个问题，我们为RL称为P2PDRL的点对点在线蒸馏策略，其中每个工人分配给不同的环境，并通过基于Kullback-Leibler Divergence的相互正则化交换知识。我们对连续控制任务的实验表明，P2PDRL比基准相比，在更广泛的随机化分布中实现了强大的学习，并且对测试时对新环境的概括性更强。

In reinforcement learning, domain randomisation is an increasingly popular technique for learning more general policies that are robust to domain-shifts at deployment. However, naively aggregating information from randomised domains may lead to high variance in gradient estimation and unstable learning process. To address this issue, we present a peer-to-peer online distillation strategy for RL termed P2PDRL, where multiple workers are each assigned to a different environment, and exchange knowledge through mutual regularisation based on Kullback-Leibler divergence. Our experiments on continuous control tasks show that P2PDRL enables robust learning across a wider randomisation distribution than baselines, and more robust generalisation to new environments at testing.

下载PDF全文

下载文献需遵守相关版权规定

论文标题