通过基于激活的早期逐步改善元学习概括

论文标题

通过基于激活的早期逐步改善元学习概括

Improving Meta-Learning Generalization with Activation-Based Early-Stopping

论文作者

Guiroy, Simon, Pal, Christopher, Mordido, Gonçalo, Chandar, Sarath

论文摘要

几次学习的元学习算法旨在训练能够仅使用几个示例将新任务概括为新任务的神经网络。早期停滞对于性能，停止模型训练至关重要，当它达到新任务分布的最佳概括时。元学习的早期机制通常依赖于从训练（源）数据集中绘制的元验证集中的标记示例中测量模型性能。这在几个射击传输学习设置中是有问题的，其中元测试集来自不同的目标数据集（OOD），并且可能会在元验证集中进行较大的分配转移。在这项工作中，我们提出了基于激活的早期 - 安倍（ABE），这是使用基于验证的早期 - 元学习的替代方法。具体而言，我们分析了每个隐藏层的神经激活期间的演变，这是在目标任务分布的一项任务中的一组未标记的支持示例上，因为这构成了目标问题中的最小且合理的信息。我们的实验表明，有关激活的简单标签不可知统计提供了一种有效的方法来估计目标概括如何随着时间的推移如何发展。在每个隐藏层，我们从第一阶和二阶矩来表征激活分布，然后沿特征维度进一步汇总，从而在四维空间中产生紧凑而直观的表征。在检测何时，整个训练时间以及在哪个层上，目标激活轨迹与源数据的激活轨迹有所不同，使我们能够在跨不同算法，源和目标数据集的大量几个射击传输学习设置中执行早期停滞并改善概括。

Meta-Learning algorithms for few-shot learning aim to train neural networks capable of generalizing to novel tasks using only a few examples. Early-stopping is critical for performance, halting model training when it reaches optimal generalization to the new task distribution. Early-stopping mechanisms in Meta-Learning typically rely on measuring the model performance on labeled examples from a meta-validation set drawn from the training (source) dataset. This is problematic in few-shot transfer learning settings, where the meta-test set comes from a different target dataset (OOD) and can potentially have a large distributional shift with the meta-validation set. In this work, we propose Activation Based Early-stopping (ABE), an alternative to using validation-based early-stopping for meta-learning. Specifically, we analyze the evolution, during meta-training, of the neural activations at each hidden layer, on a small set of unlabelled support examples from a single task of the target tasks distribution, as this constitutes a minimal and justifiably accessible information from the target problem. Our experiments show that simple, label agnostic statistics on the activations offer an effective way to estimate how the target generalization evolves over time. At each hidden layer, we characterize the activation distributions, from their first and second order moments, then further summarized along the feature dimensions, resulting in a compact yet intuitive characterization in a four-dimensional space. Detecting when, throughout training time, and at which layer, the target activation trajectory diverges from the activation trajectory of the source data, allows us to perform early-stopping and improve generalization in a large array of few-shot transfer learning settings, across different algorithms, source and target datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题