驯服：使用多个专家的任务不可能持续学习

论文标题

驯服：使用多个专家的任务不可能持续学习

TAME: Task Agnostic Continual Learning using Multiple Experts

论文作者

Zhu, Haoran, Majzoubi, Maryam, Jain, Arihant, Choromanska, Anna

论文摘要

终生学习的目标是不断地从非平稳分布中学习，在非平稳性分布中，非平稳性通常由一系列不同的任务施加。先前的作品主要被认为是理想主义的环境，其中至少在培训中知道任务的身份。在本文中，我们着重于从根本上艰难的，所谓的任务不合SNOSTIC设置，其中任务身份尚不清楚，并且学习机需要从观察结果中推断出来。我们称之为TAME（使用多个专家的任务无关学习）的算法自动以在线方式检测到任务专家网络之间的数据分布的变化和切换。在训练中，任务之间切换的策略取决于非常简单的观察，即对于每个新任务，都会发生统计上的损失函数价值偏差，标志着这项新任务的开始。推断时，专家之间的切换受选择器网络的约束，该网络将测试样本转发到其相关的专家网络。选择器网络在随机均匀绘制的一小部分数据子集上进行训练。我们通过使用在线修剪来控制任务专家网络以及选择器网络的增长。我们的实验结果表明，我们的方法在基准测试中的持续学习数据集上的功效，表现优于先前的任务不合时宜方法，甚至超过了在培训和测试中接受任务身份的技术，同时使用可比的模型大小。

The goal of lifelong learning is to continuously learn from non-stationary distributions, where the non-stationarity is typically imposed by a sequence of distinct tasks. Prior works have mostly considered idealistic settings, where the identity of tasks is known at least at training. In this paper we focus on a fundamentally harder, so-called task-agnostic setting where the task identities are not known and the learning machine needs to infer them from the observations. Our algorithm, which we call TAME (Task-Agnostic continual learning using Multiple Experts), automatically detects the shift in data distributions and switches between task expert networks in an online manner. At training, the strategy for switching between tasks hinges on an extremely simple observation that for each new coming task there occurs a statistically-significant deviation in the value of the loss function that marks the onset of this new task. At inference, the switching between experts is governed by the selector network that forwards the test sample to its relevant expert network. The selector network is trained on a small subset of data drawn uniformly at random. We control the growth of the task expert networks as well as selector network by employing online pruning. Our experimental results show the efficacy of our approach on benchmark continual learning data sets, outperforming the previous task-agnostic methods and even the techniques that admit task identities at both training and testing, while at the same time using a comparable model size.

下载PDF全文

下载文献需遵守相关版权规定

论文标题