论文标题
神经多目标组合优化的帕累托设置学习
Pareto Set Learning for Neural Multi-objective Combinatorial Optimization
论文作者
论文摘要
在许多实际应用中可以找到多物体组合组合优化(MOCO)问题。但是,完全解决这些问题将非常具有挑战性,尤其是当它们是NP-HARD时。在过去的几十年中,已经提出了许多手工制作的启发式方法来解决不同的MOCO问题。在这项工作中,我们概括了神经组合优化的概念,并开发了一种基于学习的方法,以近似于给定的MOCO问题的整个帕累托,而无需进一步的搜索程序。我们提出了一个单个偏好条件模型,以直接为任何权衡偏好生成近似的帕累托解决方案,并设计有效的多主体强化学习算法来训练该模型。我们提出的方法可以作为基于学习的扩展,用于广泛使用的基于分解的多主体进化算法(MOEA/D)。它使用单个模型来适应所有可能的偏好,而其他方法则使用有限数量的解决方案来近似帕累托集。实验结果表明,我们所提出的方法在多目标旅行人员问题,多目标车辆路由问题以及多物原理背包问题方面大大优于其他一些方法,就解决方案质量,速度和模型效率而言。
Multiobjective combinatorial optimization (MOCO) problems can be found in many real-world applications. However, exactly solving these problems would be very challenging, particularly when they are NP-hard. Many handcrafted heuristic methods have been proposed to tackle different MOCO problems over the past decades. In this work, we generalize the idea of neural combinatorial optimization, and develop a learning-based approach to approximate the whole Pareto set for a given MOCO problem without further search procedure. We propose a single preference-conditioned model to directly generate approximate Pareto solutions for any trade-off preference, and design an efficient multiobjective reinforcement learning algorithm to train this model. Our proposed method can be treated as a learning-based extension for the widely-used decomposition-based multiobjective evolutionary algorithm (MOEA/D). It uses a single model to accommodate all the possible preferences, whereas other methods use a finite number of solutions to approximate the Pareto set. Experimental results show that our proposed method significantly outperforms some other methods on the multiobjective traveling salesman problem, multiobjective vehicle routing problem, and multiobjective knapsack problem in terms of solution quality, speed, and model efficiency.