论文标题
基于内核的非平稳强化学习方法
A Kernel-Based Approach to Non-Stationary Reinforcement Learning in Metric Spaces
论文作者
论文摘要
在这项工作中,我们提出了KERNS:在非本质马尔可夫决策过程(MDPS)中进行情节增强学习的算法,其状态行动集已赋予指标。使用由时间依赖性内核构建的MDP的非参数模型,我们证明了一种遗憾的界限,该遗憾与状态行动空间的覆盖范围和随着时间的流逝的总变化,从而量化了其非平稳性水平。我们的方法概括了基于滑动窗口和用于处理变化环境的指数折扣的先前方法。我们进一步提出了Kerns的实际实施,我们分析了它的遗憾并通过实验验证它。
In this work, we propose KeRNS: an algorithm for episodic reinforcement learning in non-stationary Markov Decision Processes (MDPs) whose state-action set is endowed with a metric. Using a non-parametric model of the MDP built with time-dependent kernels, we prove a regret bound that scales with the covering dimension of the state-action space and the total variation of the MDP with time, which quantifies its level of non-stationarity. Our method generalizes previous approaches based on sliding windows and exponential discounting used to handle changing environments. We further propose a practical implementation of KeRNS, we analyze its regret and validate it experimentally.