论文标题
MTL-NAS:任务无形的神经体系结构搜索通用多任务学习
MTL-NAS: Task-Agnostic Neural Architecture Search towards General-Purpose Multi-Task Learning
论文作者
论文摘要
我们建议将神经体系结构搜索(NAS)纳入通用多任务学习(GP-MTL)。现有的NAS方法通常根据不同的任务定义不同的搜索空间。为了适应不同的任务组合(即任务集),我们将GP-MTL网络分解为单任务式backbone(可选地编码任务验证者),以及层次和层次和层次和层次的共享/融合方案。这使我们能够设计一个新颖而通用的任务不可策划的搜索空间,该空间将交叉任务边缘(即功能融合连接)插入固定的单任务网络backbone。此外,我们还提出了一种新颖的基于单次梯度的搜索算法,该算法缩小了搜索架构与最终评估体系结构之间的性能差距。这是在搜索阶段对体系结构权重的最小熵正则化来实现的,这使体系结构的权重融合到近乎差异的值,因此实现了单个模型。结果,我们的搜索模型可直接用于评估,而无需从头开始(重新)培训。我们在各种任务集上使用不同的单任务式主持器进行了广泛的实验,这证明了通过利用层次结构和图层特征获得的有希望的性能,以及对不同i)i)任务集的理想概括性以及ii)单任任务backbones。我们论文的代码可在https://github.com/bhpfelix/mtlnas上找到。
We propose to incorporate neural architecture search (NAS) into general-purpose multi-task learning (GP-MTL). Existing NAS methods typically define different search spaces according to different tasks. In order to adapt to different task combinations (i.e., task sets), we disentangle the GP-MTL networks into single-task backbones (optionally encode the task priors), and a hierarchical and layerwise features sharing/fusing scheme across them. This enables us to design a novel and general task-agnostic search space, which inserts cross-task edges (i.e., feature fusion connections) into fixed single-task network backbones. Moreover, we also propose a novel single-shot gradient-based search algorithm that closes the performance gap between the searched architectures and the final evaluation architecture. This is realized with a minimum entropy regularization on the architecture weights during the search phase, which makes the architecture weights converge to near-discrete values and therefore achieves a single model. As a result, our searched model can be directly used for evaluation without (re-)training from scratch. We perform extensive experiments using different single-task backbones on various task sets, demonstrating the promising performance obtained by exploiting the hierarchical and layerwise features, as well as the desirable generalizability to different i) task sets and ii) single-task backbones. The code of our paper is available at https://github.com/bhpfelix/MTLNAS.