论文标题

平均成本MDP的SSP Q学习浓度界限

Concentration bounds for SSP Q-learning for average cost MDPs

论文作者

Haque, Shaan Ul, Borkar, Vivek

论文摘要

我们根据等效路径问题的平均成本决策过程得出了Q学习算法的浓度,并将其与基于相对价值迭代的替代方案进行比较。

We derive a concentration bound for a Q-learning algorithm for average cost Markov decision processes based on an equivalent shortest path problem, and compare it numerically with the alternative scheme based on relative value iteration.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源