自适应渠道分配用于可靠的可微分体系结构搜索

论文标题

自适应渠道分配用于可靠的可微分体系结构搜索

Adaptive Channel Allocation for Robust Differentiable Architecture Search

论文作者

Li, Chao, Ning, Jia, Hu, Han, He, Kun

论文摘要

由于其简单性和效率的显着提高，可区分的架构搜索（飞镖）引起了很多关注。但是，当训练时期变得较大时，跳过连接的过度积累使其遭受稳定性较弱和鲁棒性较低，从而限制了其实际应用。许多作品试图通过指标或手动设计限制跳过连接的积累。但是，这些方法容易受到人类先验和超参数的影响。在这项工作中，我们提出了一种更加微妙和直接的方法，它不再在搜索阶段明确搜索跳过连接的悖论，该悖论基于Skip Connections提出的悖论来确保非常深网络的性能，但是在可微不足道的体系结构搜索搜索阶段的网络实际上非常浅。取而代之的是，通过引入渠道重要性排名和渠道分配策略，在评估阶段，SKIP连接被隐式搜索并自动添加不重要的频道。我们的方法称为自适应通道分配（ACA）策略，是一种通用方法，用于可区分架构搜索，它在飞镖变体中普遍起作用，而无需引入人类先验，指标或超参数。各种数据集和飞镖变体上的广泛实验验证了ACA策略是在训练时期大大时改善鲁棒性和处理崩溃问题的现有方法中最有效的实验。

Differentiable ARchiTecture Search (DARTS) has attracted much attention due to its simplicity and significant improvement in efficiency. However, the excessive accumulation of the skip connection, when training epochs become large, makes it suffer from weak stability and low robustness, thus limiting its practical applications. Many works have attempted to restrict the accumulation of skip connections by indicators or manual design. These methods, however, are susceptible to human priors and hyper-parameters. In this work, we suggest a more subtle and direct approach that no longer explicitly searches for skip connections in the search stage, based on the paradox that skip connections were proposed to guarantee the performance of very deep networks, but the networks in the search stage of differentiable architecture search are actually very shallow. Instead, by introducing channel importance ranking and channel allocation strategy, the skip connections are implicitly searched and automatically refilled unimportant channels in the evaluation stage. Our method, dubbed Adaptive Channel Allocation (ACA) strategy, is a general-purpose approach for differentiable architecture search, which universally works in DARTS variants without introducing human priors, indicators, or hyper-parameters. Extensive experiments on various datasets and DARTS variants verify that the ACA strategy is the most effective one among existing methods in improving robustness and dealing with the collapse issue when training epochs become large.

下载PDF全文

下载文献需遵守相关版权规定

论文标题