TAP-NET：使用加固学习的运输和包装

论文标题

TAP-NET：使用加固学习的运输和包装

TAP-Net: Transport-and-Pack using Reinforcement Learning

论文作者

Hu, Ruizhen, Xu, Juzhan, Chen, Bin, Gong, Minglun, Zhang, Hao, Huang, Hui

论文摘要

我们介绍了运输包（TAP）问题，这是一个经常遇到的现实包装实例，并基于强化学习开发了神经优化解决方案。给定盒子的初始空间配置，我们寻求一种有效的方法来迭代运输并将盒子紧凑地包装到目标容器中。由于障碍和可及性限制，我们的问题必须添加一个新的搜索维度，即找到最佳的传输顺序，以便单独包装的搜索空间。使用基于学习的方法，训练有素的网络可以学习和编码解决方案模式，以指导新问题实例的解决方案，而不是执行昂贵的在线搜索。在我们的工作中，我们使用优先级图表代表运输约束，并使用加强学习来奖励有效且稳定的包装，并训练神经网络（即cin tap-net）。该网络建立在编码器架构上，在该体系结构中，编码器采用卷积层来编码框的几何和先例图，而解码器则是一个经常性的神经网络（RNN），该神经网络（RNN）输入当前的编码器输出，以及当前的目标容器的当前盒装状态以及下一个盒子的盒装状态，以及将其输出以包装，以及它的Ortection和Orecorestion。我们通过政策梯度在没有监督的无监督的随机生成的初始框配置上训练网络，以学习最佳的TAP策略，以最大程度地提高包装效率和稳定性。我们证明了TAP-NET在各种示例上的性能，通过消融研究评估网络以及与基准和替代网络设计的比较。我们还表明，在接受小型输入培训时，我们的网络可以很好地概括到更大的问题实例。

We introduce the transport-and-pack(TAP) problem, a frequently encountered instance of real-world packing, and develop a neural optimization solution based on reinforcement learning. Given an initial spatial configuration of boxes, we seek an efficient method to iteratively transport and pack the boxes compactly into a target container. Due to obstruction and accessibility constraints, our problem has to add a new search dimension, i.e., finding an optimal transport sequence, to the already immense search space for packing alone. Using a learning-based approach, a trained network can learn and encode solution patterns to guide the solution of new problem instances instead of executing an expensive online search. In our work, we represent the transport constraints using a precedence graph and train a neural network, coined TAP-Net, using reinforcement learning to reward efficient and stable packing. The network is built on an encoder-decoder architecture, where the encoder employs convolution layers to encode the box geometry and precedence graph and the decoder is a recurrent neural network (RNN) which inputs the current encoder output, as well as the current box packing state of the target container, and outputs the next box to pack, as well as its orientation. We train our network on randomly generated initial box configurations, without supervision, via policy gradients to learn optimal TAP policies to maximize packing efficiency and stability. We demonstrate the performance of TAP-Net on a variety of examples, evaluating the network through ablation studies and comparisons to baselines and alternative network designs. We also show that our network generalizes well to larger problem instances, when trained on small-sized inputs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题