学习通过硬度自适应课程解决旅行推销员问题

论文标题

学习通过硬度自适应课程解决旅行推销员问题

Learning to Solve Travelling Salesman Problem with Hardness-adaptive Curriculum

论文作者

Zhang, Zeyang, Zhang, Ziwei, Wang, Xin, Zhu, Wenwu

论文摘要

已经提出了各种神经网络模型来解决组合优化问题，例如旅行推销员问题（TSP）。现有的基于学习的TSP方法采用了一个简单的设置，即训练和测试数据是独立的且分布相同的。但是，现有文献在培训和测试数据具有不同的分布时无法解决TSP实例。具体而言，我们发现不同的训练和测试分布将导致更困难的TSP实例，即，该模型获得的解决方案具有从最佳解决方案中的较大差距。为了解决这一问题，在这项工作中，我们研究了基于学习的TSP方法，当培训和测试数据使用自适应硬度分布不同，即TSP实例对于求解器来说是多么困难。这个问题是具有挑战性的，因为（1）定量定义硬度测量是不乏味的；（2）在模型培训时有效且连续地产生足够的硬TSP实例；（3）充分利用具有不同级别硬度的实例来学习更强大的TSP求解器。为了解决这些挑战，我们首先提出了一个有原则的硬度测量，以量化TSP实例的硬度。然后，我们提出了一个硬度自适应发生器，以生成不同硬度的实例。我们进一步提出了一个充分利用这些实例来训练TSP求解器的课程学习者。实验表明，我们的硬度自适应发生器可以比现有方法更难生成十倍，并且我们提出的方法在最佳差距方面对最先进的模型实现了显着改善。

Various neural network models have been proposed to tackle combinatorial optimization problems such as the travelling salesman problem (TSP). Existing learning-based TSP methods adopt a simple setting that the training and testing data are independent and identically distributed. However, the existing literature fails to solve TSP instances when training and testing data have different distributions. Concretely, we find that different training and testing distribution will result in more difficult TSP instances, i.e., the solution obtained by the model has a large gap from the optimal solution. To tackle this problem, in this work, we study learning-based TSP methods when training and testing data have different distributions using adaptive-hardness, i.e., how difficult a TSP instance can be for a solver. This problem is challenging because it is non-trivial to (1) define hardness measurement quantitatively; (2) efficiently and continuously generate sufficiently hard TSP instances upon model training; (3) fully utilize instances with different levels of hardness to learn a more powerful TSP solver. To solve these challenges, we first propose a principled hardness measurement to quantify the hardness of TSP instances. Then, we propose a hardness-adaptive generator to generate instances with different hardness. We further propose a curriculum learner fully utilizing these instances to train the TSP solver. Experiments show that our hardness-adaptive generator can generate instances ten times harder than the existing methods, and our proposed method achieves significant improvement over state-of-the-art models in terms of the optimality gap.

下载PDF全文

下载文献需遵守相关版权规定

论文标题