通过梯度匹配概括少量射击NA

论文标题

通过梯度匹配概括少量射击NA

Generalizing Few-Shot NAS with Gradient Matching

论文作者

Hu, Shoukang, Wang, Ruochen, Hong, Lanqing, Li, Zhenguo, Hsieh, Cho-Jui, Feng, Jiashi

论文摘要

从大型搜索空间绘制的架构的有效性能估计对于神经体系结构搜索至关重要。一击方法通过训练一只超网以通过重量分享来近似搜索空间中每个体系结构的性能来应对这一挑战，从而大大降低了搜索成本。但是，由于由体重分担引起的儿童体系结构之间的优化耦合，因此，超级网的性能估计可能不准确，从而导致搜索结果降低。为了解决这个问题，很少有NAS通过通过边缘（层）详尽的分区将单发的超级网拆分为多个分离的子苏植物，从而降低了体重分享的水平。由于超级网的每个分区都不重要，因此必须设计更有效的分裂标准。在这项工作中，我们提出了一个梯度匹配分数（GM），该分数以共同的重量利用梯度信息来做出明智的分裂决策。从直觉上，可以使用来自不同儿童模型的梯度来确定他们是否同意如何更新共享模块，然后决定是否应共享相同的权重。与详尽的分区相比，提出的标准显着降低了每个边缘的分支因子。这使我们可以为给定的预算拆分更多的边缘（层），从而大大提高了性能，因为NAS搜索空间通常包括数十个边缘（层）。对拟议方法在广泛的搜索空间（NASBENCH-201，飞镖，Mobilenet空间），数据集，数据集（CIFAR10，CIFAR100，Imagenet）和搜索算法（DARTS，SNAS，SNAS，RSP，RSP，RSP，Proxyless，Proxyless，Ofa ofa ofa）上进行的反对，对较量的对抗，则对所提出的方法进行了几乎相对的方法衍生架构的准确性。

Efficient performance estimation of architectures drawn from large search spaces is essential to Neural Architecture Search. One-Shot methods tackle this challenge by training one supernet to approximate the performance of every architecture in the search space via weight-sharing, thereby drastically reducing the search cost. However, due to coupled optimization between child architectures caused by weight-sharing, One-Shot supernet's performance estimation could be inaccurate, leading to degraded search outcomes. To address this issue, Few-Shot NAS reduces the level of weight-sharing by splitting the One-Shot supernet into multiple separated sub-supernets via edge-wise (layer-wise) exhaustive partitioning. Since each partition of the supernet is not equally important, it necessitates the design of a more effective splitting criterion. In this work, we propose a gradient matching score (GM) that leverages gradient information at the shared weight for making informed splitting decisions. Intuitively, gradients from different child models can be used to identify whether they agree on how to update the shared modules, and subsequently to decide if they should share the same weight. Compared with exhaustive partitioning, the proposed criterion significantly reduces the branching factor per edge. This allows us to split more edges (layers) for a given budget, resulting in substantially improved performance as NAS search spaces usually include dozens of edges (layers). Extensive empirical evaluations of the proposed method on a wide range of search spaces (NASBench-201, DARTS, MobileNet Space), datasets (cifar10, cifar100, ImageNet) and search algorithms (DARTS, SNAS, RSPS, ProxylessNAS, OFA) demonstrate that it significantly outperforms its Few-Shot counterparts while surpassing previous comparable methods in terms of the accuracy of derived architectures.

下载PDF全文

下载文献需遵守相关版权规定

论文标题