论文标题
凸赫尔蒙特卡洛树搜索
Convex Hull Monte-Carlo Tree Search
论文作者
论文摘要
这项工作调查了具有多个目标的随机环境中代理的蒙特卡洛计划。我们提出了凸面船体蒙特 - 卡洛树搜索(CHMCT)框架,该框架基于基于试验的启发式树搜索和凸出船体值迭代(CHVI),作为在大环境中进行多目标计划的解决方案。此外,我们考虑如何提出将多主体计划解决方案作为上下文的多军匪徒问题的问题,从而为如何从上下文遗憾的角度看待采取行动有原则的动机。这导致我们使用上下文缩放进行操作选择,从而产生缩放CHMCT。我们使用广义的深海宝藏环境评估算法,表明在给定的计算预算上,缩放CHMCT可以比CHVI获得均方根的背景遗憾,并且比CHVI更好。
This work investigates Monte-Carlo planning for agents in stochastic environments, with multiple objectives. We propose the Convex Hull Monte-Carlo Tree-Search (CHMCTS) framework, which builds upon Trial Based Heuristic Tree Search and Convex Hull Value Iteration (CHVI), as a solution to multi-objective planning in large environments. Moreover, we consider how to pose the problem of approximating multiobjective planning solutions as a contextual multi-armed bandits problem, giving a principled motivation for how to select actions from the view of contextual regret. This leads us to the use of Contextual Zooming for action selection, yielding Zooming CHMCTS. We evaluate our algorithm using the Generalised Deep Sea Treasure environment, demonstrating that Zooming CHMCTS can achieve a sublinear contextual regret and scales better than CHVI on a given computational budget.