在线加权Q浓度用于减少加固学习中的超参数调整

论文标题

在线加权Q浓度用于减少加固学习中的超参数调整

Online Weighted Q-Ensembles for Reduced Hyperparameter Tuning in Reinforcement Learning

论文作者

Garcia, Renata, Caarls, Wouter

论文摘要

强化学习是学习机器人控制的有希望的范式，可以在不需要动态模型的情况下学习复杂的控制策略。但是，即使是最先进的算法也很难为最佳性能调整。我们建议采用多个增强学习剂的集合，每个集合都具有不同的超参数，以及选择最佳性能集合的机制。在文献中，整体技术通常用于提高性能，但是当前的工作专门解决了降低高参数调谐工作的问题。此外，我们的方法针对单个机器人系统的在线学习，并且不需要并行运行多个模拟器。尽管这个想法是通用的，但深层的确定性政策梯度是所选择的模型，它是一种代表性的深度学习参与者 - 批评方法，在连续的动作环境中具有良好的性能，但已知高度差异。我们比较了在线加权Q-安装方法，以使用替代政策培训以及在线培训在文献中解决的Q平均合奏策略，并展示了新方法在消除超参数调整方面的优势。在常见的机器人基准环境中验证了对现实世界系统的适用性：双足机器人半猎豹和游泳者。与使用随机参数化的Q平均合奏相比，在线加权Q-汇编表现出较低的差异和卓越的结果。

Reinforcement learning is a promising paradigm for learning robot control, allowing complex control policies to be learned without requiring a dynamics model. However, even state of the art algorithms can be difficult to tune for optimum performance. We propose employing an ensemble of multiple reinforcement learning agents, each with a different set of hyperparameters, along with a mechanism for choosing the best performing set(s) on-line. In the literature, the ensemble technique is used to improve performance in general, but the current work specifically addresses decreasing the hyperparameter tuning effort. Furthermore, our approach targets on-line learning on a single robotic system, and does not require running multiple simulators in parallel. Although the idea is generic, the Deep Deterministic Policy Gradient was the model chosen, being a representative deep learning actor-critic method with good performance in continuous action settings but known high variance. We compare our online weighted q-ensemble approach to q-average ensemble strategies addressed in literature using alternate policy training, as well as online training, demonstrating the advantage of the new approach in eliminating hyperparameter tuning. The applicability to real-world systems was validated in common robotic benchmark environments: the bipedal robot half cheetah and the swimmer. Online Weighted Q-Ensemble presented overall lower variance and superior results when compared with q-average ensembles using randomized parameterizations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题