折扣因子作为加固学习的常规化因素

论文标题

折扣因子作为加固学习的常规化因素

Discount Factor as a Regularizer in Reinforcement Learning

论文作者

Amit, Ron, Meir, Ron, Ciosek, Kamil

论文摘要

指定增强学习（RL）任务涉及选择合适的计划范围，该计划通常由折扣因子建模。众所周知，应用较低折现因子的RL算法可以充当正规器，从而提高了有限数据制度的性能。然而，尚未调查此正规化器的确切性质。在这项工作中，我们填补了这一空白。对于几种时间差异（TD）学习方法，我们在使用降低的折现因子与将明确的正则化项添加到该算法的损失之间显示出明确的等价性。在同等性的激励下，我们使用表格和功能表示，与标准的$ L_2 $正规化相比，我们在离散和连续域中进行了广泛的实验，从经验上研究了这项技术。我们的实验表明，正则化有效性与可用数据的性质（例如大小，分布和混合速率）密切相关。

Specifying a Reinforcement Learning (RL) task involves choosing a suitable planning horizon, which is typically modeled by a discount factor. It is known that applying RL algorithms with a lower discount factor can act as a regularizer, improving performance in the limited data regime. Yet the exact nature of this regularizer has not been investigated. In this work, we fill in this gap. For several Temporal-Difference (TD) learning methods, we show an explicit equivalence between using a reduced discount factor and adding an explicit regularization term to the algorithm's loss. Motivated by the equivalence, we empirically study this technique compared to standard $L_2$ regularization by extensive experiments in discrete and continuous domains, using tabular and functional representations. Our experiments suggest the regularization effectiveness is strongly related to properties of the available data, such as size, distribution, and mixing rate.

下载PDF全文

下载文献需遵守相关版权规定

论文标题