价值函数的时间变化变异转移

论文标题

价值函数的时间变化变异转移

Time-Variant Variational Transfer for Value Functions

论文作者

Canonaco, Giuseppe, Soprani, Andrea, Roveri, Manuel, Restelli, Marcello

论文摘要

在大多数转移学习方法中，增强学习（RL）的任务分布被认为是固定的。因此，目标和源任务是I.I.D.相同分布的样本。在这项工作的上下文中，当生成任务的分布是时间变化时，我们考虑通过变异方法传输价值函数的问题，提出了一种解决任务生成过程中固有的时间结构的解决方案。此外，通过有限样本分析，从理论上讲，前面提到的解决方案与其时间不变版本进行了比较。最后，我们将对所提出的技术进行实验评估，并在三个不同的RL环境中具有三种不同的时间动力学。

In most of the transfer learning approaches to reinforcement learning (RL) the distribution over the tasks is assumed to be stationary. Therefore, the target and source tasks are i.i.d. samples of the same distribution. In the context of this work, we consider the problem of transferring value functions through a variational method when the distribution that generates the tasks is time-variant, proposing a solution that leverages this temporal structure inherent in the task generating process. Furthermore, by means of a finite-sample analysis, the previously mentioned solution is theoretically compared to its time-invariant version. Finally, we will provide an experimental evaluation of the proposed technique with three distinct temporal dynamics in three different RL environments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题