论文标题
有效的数据扩展计划,以进行深入强化学习
Efficient Scheduling of Data Augmentation for Deep Reinforcement Learning
论文作者
论文摘要
在深度强化学习(RL)中,数据增强被广泛视为诱导有关语义一致性并提高样本效率和概括性能的一组有用的先验的工具。但是,即使先验对于概括有用,将其提炼为RL剂也经常会干扰RL训练并退化样品效率。同时,由于RL的非平稳性,代理人是遗忘的。这些观察结果表明了两个极端的蒸馏时间:(i)在整个培训中;或(ii)仅在最后。因此,我们设计了一种独立的网络蒸馏方法,以任何时间(即使在RL之后)中注入一致性,并是一个简单而有效的框架,以自动安排蒸馏。具体而言,所提出的框架首先着重于掌握火车环境,而不论其概括性通过自适应确定要用于培训的{\ it或no}增强。之后,我们添加蒸馏以从所有增强中提取概括的剩余好处,这不需要其他新样本。在我们的实验中,我们尤其证明了提出的框架的实用性,这些框架认为将增强推迟到RL培训结束。
In deep reinforcement learning (RL), data augmentation is widely considered as a tool to induce a set of useful priors about semantic consistency and improve sample efficiency and generalization performance. However, even when the prior is useful for generalization, distilling it to RL agent often interferes with RL training and degenerates sample efficiency. Meanwhile, the agent is forgetful of the prior due to the non-stationary nature of RL. These observations suggest two extreme schedules of distillation: (i) over the entire training; or (ii) only at the end. Hence, we devise a stand-alone network distillation method to inject the consistency prior at any time (even after RL), and a simple yet efficient framework to automatically schedule the distillation. Specifically, the proposed framework first focuses on mastering train environments regardless of generalization by adaptively deciding which {\it or no} augmentation to be used for the training. After this, we add the distillation to extract the remaining benefits for generalization from all the augmentations, which requires no additional new samples. In our experiments, we demonstrate the utility of the proposed framework, in particular, that considers postponing the augmentation to the end of RL training.