论文标题
并非所有模型都是平等的:预测自我挑战的费舍尔空间中的模型可传递性
Not All Models Are Equal: Predicting Model Transferability in a Self-challenging Fisher Space
论文作者
论文摘要
本文解决了对预先训练的深神经网络进行排名并筛选最下游任务的重要问题。这很具有挑战性,因为每个任务的地面模型排名只能通过微调目标数据集中的预训练模型来生成,该模型是蛮力且计算昂贵的。最近的高级方法提出了几个轻巧的可转移性指标来预测微调结果。但是,这些方法仅捕获静态表示,但忽略了微调动态。为此,本文提出了一个新的可传递性指标,称为\ textbf {s} elf-challenging \ textbf {f} isher \ textbf {d} is Criminant \ textbf {a} nalysis(\ textbf {\ textbf {sfda}),这具有许多有吸引力的好处,而现有的有益于现有的工作。首先,SFDA可以将静态特征嵌入到渔民空间中,并将其改进,以在类之间更好地分离。其次,SFDA使用一种自我挑战的机制来鼓励不同的预训练模型来区分硬性示例。第三,SFDA可以轻松地为模型集合选择多个预训练的模型。 $ 33 $预培训的$ 11 $下游任务的$ 33 $预培训模型的大量实验表明,在测量预训练模型的可传递性时,SFDA高效,有效且健壮。例如,与最先进的方法NLEEP相比,SFDA平均显示了59.1美元的增益,同时带来了$ 22.5 $ x的墙壁速度加速。该代码将在\ url {https://github.com/tencentarc/sfda}上提供。
This paper addresses an important problem of ranking the pre-trained deep neural networks and screening the most transferable ones for downstream tasks. It is challenging because the ground-truth model ranking for each task can only be generated by fine-tuning the pre-trained models on the target dataset, which is brute-force and computationally expensive. Recent advanced methods proposed several lightweight transferability metrics to predict the fine-tuning results. However, these approaches only capture static representations but neglect the fine-tuning dynamics. To this end, this paper proposes a new transferability metric, called \textbf{S}elf-challenging \textbf{F}isher \textbf{D}iscriminant \textbf{A}nalysis (\textbf{SFDA}), which has many appealing benefits that existing works do not have. First, SFDA can embed the static features into a Fisher space and refine them for better separability between classes. Second, SFDA uses a self-challenging mechanism to encourage different pre-trained models to differentiate on hard examples. Third, SFDA can easily select multiple pre-trained models for the model ensemble. Extensive experiments on $33$ pre-trained models of $11$ downstream tasks show that SFDA is efficient, effective, and robust when measuring the transferability of pre-trained models. For instance, compared with the state-of-the-art method NLEEP, SFDA demonstrates an average of $59.1$\% gain while bringing $22.5$x speedup in wall-clock time. The code will be available at \url{https://github.com/TencentARC/SFDA}.