论文标题
用于工业调度的二重性学习模型
Bilevel Learning Model Towards Industrial Scheduling
论文作者
论文摘要
在制造行业中,人们普遍需要自动的工业调度,以优化在有限资源上优化工作顺序。但是,现有的调度系统在很大程度上依赖启发式算法,该算法会产生无效的解决方案,或者在工作量表增加时效率低下。因此,开发新的大规模算法不仅有效有效,而且能够满足实践中的复杂约束,这一点非常重要。在本文中,我们提出了一个双重的深钢筋学习调度程序,\ textit {bds},其中较高级别负责探索初始的全局序列,而较低级别的目的是旨在利用部分序列的细化,而两个级别则通过Aliding Window采样机制连接了两个级别。在实施中,在上层和图形指针网络(GPN)中使用了双重深Q网络(DDQN)。在BDS收敛的理论保证之后,我们在工业自动仓库方案中对其进行了评估,每条生产线的工作数量高达5000美元。结果表明,我们提出的BDS显着优于两种最常用的启发式方法,三个强大的深网和另一种二聚体基线方法。特别是,与现实世界中最常用的基于贪婪的启发式算法相比,我们的BDS分别可以将MakePan降低27.5 \%,28.6 \%\%\%和22.1 \%,对于3个最大的数据集,计算时间小于200秒。
Automatic industrial scheduling, aiming at optimizing the sequence of jobs over limited resources, is widely needed in manufacturing industries. However, existing scheduling systems heavily rely on heuristic algorithms, which either generate ineffective solutions or compute inefficiently when job scale increases. Thus, it is of great importance to develop new large-scale algorithms that are not only efficient and effective, but also capable of satisfying complex constraints in practice. In this paper, we propose a Bilevel Deep reinforcement learning Scheduler, \textit{BDS}, in which the higher level is responsible for exploring an initial global sequence, whereas the lower level is aiming at exploitation for partial sequence refinements, and the two levels are connected by a sliding-window sampling mechanism. In the implementation, a Double Deep Q Network (DDQN) is used in the upper level and Graph Pointer Network (GPN) lies within the lower level. After the theoretical guarantee for the convergence of BDS, we evaluate it in an industrial automatic warehouse scenario, with job number up to $5000$ in each production line. It is shown that our proposed BDS significantly outperforms two most used heuristics, three strong deep networks, and another bilevel baseline approach. In particular, compared with the most used greedy-based heuristic algorithm in real world which takes nearly an hour, our BDS can decrease the makespan by 27.5\%, 28.6\% and 22.1\% for 3 largest datasets respectively, with computational time less than 200 seconds.