论文标题
改善基于RL的火车重新安排的样本效率和多代理通信
Improving Sample Efficiency and Multi-Agent Communication in RL-based Train Rescheduling
论文作者
论文摘要
我们从第六次进入Flatland International竞争进行重新安排的国际竞赛中提出了初步结果,其中包括两项改进的优化强化学习(RL)培训效率,以及关于复杂的现实世界控制任务深度RL的前景的两个假设:首先,ART政策梯度的最新状态似乎不适当地在高consemence of High-Consemence Envirence;其次,学习明确的沟通动作(可以说是一种新兴的机器对机器语言)可能会提供一种补救措施。这些假设需要通过未来的工作来确认。如果得到确认,他们就可以在优化高效的物流生态系统(如瑞士联邦铁路铁路网络)方面承诺。
We present preliminary results from our sixth placed entry to the Flatland international competition for train rescheduling, including two improvements for optimized reinforcement learning (RL) training efficiency, and two hypotheses with respect to the prospect of deep RL for complex real-world control tasks: first, that current state of the art policy gradient methods seem inappropriate in the domain of high-consequence environments; second, that learning explicit communication actions (an emerging machine-to-machine language, so to speak) might offer a remedy. These hypotheses need to be confirmed by future work. If confirmed, they hold promises with respect to optimizing highly efficient logistics ecosystems like the Swiss Federal Railways railway network.