线性连通性揭示了概括策略

论文标题

线性连通性揭示了概括策略

Linear Connectivity Reveals Generalization Strategies

论文作者

Juneja, Jeevesh, Bansal, Rachit, Cho, Kyunghyun, Sedoc, João, Saphra, Naomi

论文摘要

在模式连通性文献中被广泛接受的是，当两个神经网络在相同的数据上类似地训练时，它们通过路径通过参数空间连接，维持测试集精度。在某些情况下，包括从预算模型中转移学习，这些路径被认为是线性的。与现有结果相反，我们发现在文本分类器（在MNLI，QQP和COLA上训练）中，一些填充模型具有很大的障碍，这些模型在它们之间的线性路径上的损失越来越大。在每个任务上，我们都会发现模型的不同簇，这些模型簇在测试损失表面上线性连接，但与群集以外的模型断开 - 这些模型占据了表面上的单独盆地。通过测量专门制作的诊断数据集的性能，我们发现这些簇对应于不同的概括策略：一个群集的行为就像域移动下的一袋单词模型一样，而另一个集群使用句法启发式方法。我们的工作表明了损失表面的几何形状如何指导模型朝着不同的启发式功能。

It is widely accepted in the mode connectivity literature that when two neural networks are trained similarly on the same data, they are connected by a path through parameter space over which test set accuracy is maintained. Under some circumstances, including transfer learning from pretrained models, these paths are presumed to be linear. In contrast to existing results, we find that among text classifiers (trained on MNLI, QQP, and CoLA), some pairs of finetuned models have large barriers of increasing loss on the linear paths between them. On each task, we find distinct clusters of models which are linearly connected on the test loss surface, but are disconnected from models outside the cluster -- models that occupy separate basins on the surface. By measuring performance on specially-crafted diagnostic datasets, we find that these clusters correspond to different generalization strategies: one cluster behaves like a bag of words model under domain shift, while another cluster uses syntactic heuristics. Our work demonstrates how the geometry of the loss surface can guide models towards different heuristic functions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题