论文标题
AIO-P:扩大图像分类的神经性能预测因子
AIO-P: Expanding Neural Performance Predictors Beyond Image Classification
论文作者
论文摘要
评估神经网络性能对于深度神经网络设计至关重要,但要昂贵的程序。神经预测因子通过将体系结构视为样本并学习估计其在给定任务上的表现,从而提供了有效的解决方案。但是,现有的预测因子是任务依赖性的,主要是在图像分类基准上估算神经网络性能。它们也取决于搜索空间;每个预测因子旨在对具有预定义拓扑和一组操作的特定体系结构搜索空间进行预测。在本文中,我们提出了一个新颖的多合一预测因子(AIO-P),该预测因子旨在从多个独立的计算机视觉(CV)任务域和多个体系结构空间中对架构示例进行预测,然后转移到下游CV任务或神经体系结构。我们描述了我们提出的通用图表,有效的预测预处理和知识输液技术的技术,以及转移到下游任务/空间的方法。广泛的实验结果表明,AIO-P可以在1%和0.5以下的平均绝对误差(MAE)和Spearman的等级相关性(SRCC)下方,在具有或不进行微调的目标下游CV任务上,分别超过1%及以上。此外,AIO-P可以直接转移到训练期间未见的新体系结构,将其准确对其进行排名,并与旨在保留性能的算法配对同时减少拖鞋的算法时,可以作为有效的性能估计器。
Evaluating neural network performance is critical to deep neural network design but a costly procedure. Neural predictors provide an efficient solution by treating architectures as samples and learning to estimate their performance on a given task. However, existing predictors are task-dependent, predominantly estimating neural network performance on image classification benchmarks. They are also search-space dependent; each predictor is designed to make predictions for a specific architecture search space with predefined topologies and set of operations. In this paper, we propose a novel All-in-One Predictor (AIO-P), which aims to pretrain neural predictors on architecture examples from multiple, separate computer vision (CV) task domains and multiple architecture spaces, and then transfer to unseen downstream CV tasks or neural architectures. We describe our proposed techniques for general graph representation, efficient predictor pretraining and knowledge infusion techniques, as well as methods to transfer to downstream tasks/spaces. Extensive experimental results show that AIO-P can achieve Mean Absolute Error (MAE) and Spearman's Rank Correlation (SRCC) below 1% and above 0.5, respectively, on a breadth of target downstream CV tasks with or without fine-tuning, outperforming a number of baselines. Moreover, AIO-P can directly transfer to new architectures not seen during training, accurately rank them and serve as an effective performance estimator when paired with an algorithm designed to preserve performance while reducing FLOPs.