用于启动多域多域视觉分类的多路径神经网络

论文标题

用于启动多域多域视觉分类的多路径神经网络

Multi-path Neural Networks for On-device Multi-domain Visual Classification

论文作者

Wang, Qifei, Ke, Junjie, Greaves, Joshua, Chu, Grace, Bender, Gabriel, Sbaiz, Luciano, Go, Alec, Howard, Andrew, Yang, Feng, Yang, Ming-Hsuan, Gilbert, Jeff, Milanfar, Peyman

论文摘要

通过单个模型学习多个域/任务对于提高数据效率和降低众多视觉任务的推理成本很重要，尤其是在资源受限的移动设备上。但是，手工制作多域/任务模型可能既乏味又具有挑战性。本文提出了一种新颖的方法，可以自动学习一个用于移动设备上多域视觉分类的多路径网络。通过为每个域应用一个强化学习控制器，从神经体系结构搜索中学到了提出的多路径网络，以选择从类似MobilenEtv3的搜索空间创建的超级网络中的最佳路径。提出了一种自适应平衡域优先算法算法，以平衡同时在多个域上优化关节模型。确定的多路径模型选择性地在共享节点中选择性地共享域中的参数，同时将特定于域特异性参数保存在各个域路径中的非共享节点中。这种方法有效地减少了参数和拖船的总数，鼓励积极的知识转移，同时减轻跨域的负面干扰。对Visual Decathlon数据集的广泛评估表明，所提出的多路径模型在准确性，模型大小和拖船方面实现了最新的性能，并使用MobilenETV3类似架构来实现其他方法。此外，与单独学习的单个域模型相比，所提出的方法分别提高了学习单域模型的平均精度，并将参数和拖鞋的总数分别减少了78％和32％，与简单地捆绑多域学习单域模型的方法相比。

Learning multiple domains/tasks with a single model is important for improving data efficiency and lowering inference cost for numerous vision tasks, especially on resource-constrained mobile devices. However, hand-crafting a multi-domain/task model can be both tedious and challenging. This paper proposes a novel approach to automatically learn a multi-path network for multi-domain visual classification on mobile devices. The proposed multi-path network is learned from neural architecture search by applying one reinforcement learning controller for each domain to select the best path in the super-network created from a MobileNetV3-like search space. An adaptive balanced domain prioritization algorithm is proposed to balance optimizing the joint model on multiple domains simultaneously. The determined multi-path model selectively shares parameters across domains in shared nodes while keeping domain-specific parameters within non-shared nodes in individual domain paths. This approach effectively reduces the total number of parameters and FLOPS, encouraging positive knowledge transfer while mitigating negative interference across domains. Extensive evaluations on the Visual Decathlon dataset demonstrate that the proposed multi-path model achieves state-of-the-art performance in terms of accuracy, model size, and FLOPS against other approaches using MobileNetV3-like architectures. Furthermore, the proposed method improves average accuracy over learning single-domain models individually, and reduces the total number of parameters and FLOPS by 78% and 32% respectively, compared to the approach that simply bundles single-domain models for multi-domain learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题