论文标题
平均成本MDP的SSP Q学习浓度界限
Concentration bounds for SSP Q-learning for average cost MDPs
论文作者
论文摘要
我们根据等效路径问题的平均成本决策过程得出了Q学习算法的浓度,并将其与基于相对价值迭代的替代方案进行比较。
We derive a concentration bound for a Q-learning algorithm for average cost Markov decision processes based on an equivalent shortest path problem, and compare it numerically with the alternative scheme based on relative value iteration.