论文标题

简单的控制基线用于评估转移学习

Simple Control Baselines for Evaluating Transfer Learning

论文作者

Atanov, Andrei, Xu, Shijian, Beker, Onur, Filatov, Andrei, Zamir, Amir

论文摘要

例如,近年来,转移学习取得了显着的进步,并引入了基于增强的对比度自我监督学习方法。尽管已经进行了许多有关此类模型转移性能的大规模实证研究,但尚未达成一套约定的控制基准,评估实践和指标来报告,这通常会阻碍对这些方法的真实效果的细微和校准的理解。我们共享一个评估标准,旨在在内容丰富且易于访问的设置中量化和传达转移学习绩效。这是通过在评估方法中烘烤许多简单但关键的控制基线的方法来完成的,尤其是盲目猜测(量化数据集偏差),刮擦模型(量化架构贡献)和最大距离(量化最大值)(量化上限)。为了证明如何采用评估标准,我们提供了一个实证研究的例子,研究了一些有关自我监督学习的基本问题。例如,使用此标准,研究表明,现有的自我监督预训练方法的有效性倾向于图像分类任务与密集像素的预测。通常,我们鼓励/报告建议的控制基线来评估转移学习,以便获得更有意义和信息丰富的理解。

Transfer learning has witnessed remarkable progress in recent years, for example, with the introduction of augmentation-based contrastive self-supervised learning methods. While a number of large-scale empirical studies on the transfer performance of such models have been conducted, there is not yet an agreed-upon set of control baselines, evaluation practices, and metrics to report, which often hinders a nuanced and calibrated understanding of the real efficacy of the methods. We share an evaluation standard that aims to quantify and communicate transfer learning performance in an informative and accessible setup. This is done by baking a number of simple yet critical control baselines in the evaluation method, particularly the blind-guess (quantifying the dataset bias), scratch-model (quantifying the architectural contribution), and maximal-supervision (quantifying the upper-bound). To demonstrate how the evaluation standard can be employed, we provide an example empirical study investigating a few basic questions about self-supervised learning. For example, using this standard, the study shows the effectiveness of existing self-supervised pre-training methods is skewed towards image classification tasks versus dense pixel-wise predictions. In general, we encourage using/reporting the suggested control baselines in evaluating transfer learning in order to gain a more meaningful and informative understanding.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源