论文标题
负面的内环学习率学习通用特征
Negative Inner-Loop Learning Rates Learn Universal Features
论文作者
论文摘要
模型不可知的元学习(MAML)由两个优化循环组成:外循环学习了模型参数的荟萃量化,该循环跨任务共享,以及内部循环特定于任务的适应步骤。 MAML的一种Meta-SGD的变体使用相同的两个循环结构,但也学习了适应步骤的学习率。很少关注元SGD学习的学习率如何影响功能重复使用。在本文中,我们研究了学习率对Meta-SGD中的每个任务特征表示的影响。元SGD的学习率通常包含负值。在适应阶段,这些负学习率将特征从特定于任务的特征和任务不合时宜的特征转移出来。 我们在迷你imagenet数据集上进行了几个实验。对两个神经网络进行了训练,一个是使用MAML的,一个带有Meta-SGD。两种模型的特征质量均已测试如下:剥离线性分类层,通过此编码器通过标记和未标记的样品,根据其最近的邻居对未标记的样品进行分类。进行此过程:1)训练和使用元定位参数后; 2)改编后,并在该任务上进行了验证; 3)改编后,并在另一个任务上进行了验证。经过MAML训练的模型改进了其适应的任务,但在其他任务上的性能较差。 Meta-SGD训练的模型相反。它在适应的任务上的性能较差,但在其他任务上有所改善。这证实了以下假设:元SGD的负学习率会导致模型学习任务不合时宜的特征,而不仅仅是适应特定任务的特征。
Model Agnostic Meta-Learning (MAML) consists of two optimization loops: the outer loop learns a meta-initialization of model parameters that is shared across tasks, and the inner loop task-specific adaptation step. A variant of MAML, Meta-SGD, uses the same two loop structure, but also learns the learning-rate for the adaptation step. Little attention has been paid to how the learned learning-rate of Meta-SGD affects feature reuse. In this paper, we study the effect that a learned learning-rate has on the per-task feature representations in Meta-SGD. The learned learning-rate of Meta-SGD often contains negative values. During the adaptation phase, these negative learning rates push features away from task-specific features and towards task-agnostic features. We performed several experiments on the Mini-Imagenet dataset. Two neural networks were trained, one with MAML, and one with Meta-SGD. The feature quality for both models was tested as follows: strip away the linear classification layer, pass labeled and unlabeled samples through this encoder, classify the unlabeled samples according to their nearest neighbor. This process was performed: 1) after training and using the meta-initialization parameters; 2) after adaptation, and validated on that task; and 3) after adaptation, and validated on a different task. The MAML trained model improved on the task it was adapted to, but had worse performance on other tasks. The Meta-SGD trained model was the opposite; it had worse performance on the task it was adapted to, but improved on other tasks. This confirms the hypothesis that Meta-SGD's negative learning rates cause the model to learn task-agnostic features rather than simply adapt to task specific features.