论文标题
解决解决方案的挑战,艰难的长胜下深度RL任务
Challenges to Solving Combinatorially Hard Long-Horizon Deep RL Tasks
论文作者
论文摘要
深入的强化学习表明,在需要复杂推理的离散域中显示了希望,包括国际象棋,GO和Hanabi等游戏。但是,这种类型的推理在具有高维观测值的长马,连续的域中观察到的经常不太常见,而RL研究主要集中在简单高级结构的问题上(例如,打开抽屉或尽可能快地移动机器人)。受到组合硬性优化问题的启发,我们提出了一组机器人技术任务,这些任务在高级方面接受了许多不同的解决方案,但需要有关状态的推理并奖励成千上万的步骤,以获得最佳性能。至关重要的是,尽管由于稀疏的奖励,RL传统上遭受了复杂的长途任务,但我们的任务经过精心设计,可以在没有专门探索的情况下解决。然而,我们的调查发现,标准的RL方法经常因打折而忽略长期影响,而通用物质层次RL方法除非可以利用其他抽象知识。
Deep reinforcement learning has shown promise in discrete domains requiring complex reasoning, including games such as Chess, Go, and Hanabi. However, this type of reasoning is less often observed in long-horizon, continuous domains with high-dimensional observations, where instead RL research has predominantly focused on problems with simple high-level structure (e.g. opening a drawer or moving a robot as fast as possible). Inspired by combinatorially hard optimization problems, we propose a set of robotics tasks which admit many distinct solutions at the high-level, but require reasoning about states and rewards thousands of steps into the future for the best performance. Critically, while RL has traditionally suffered on complex, long-horizon tasks due to sparse rewards, our tasks are carefully designed to be solvable without specialized exploration. Nevertheless, our investigation finds that standard RL methods often neglect long-term effects due to discounting, while general-purpose hierarchical RL approaches struggle unless additional abstract domain knowledge can be exploited.