探索预训练语言模型的模式连接性

论文标题

探索预训练语言模型的模式连接性

Exploring Mode Connectivity for Pre-trained Language Models

论文作者

Qin, Yujia, Qian, Cheng, Yi, Jing, Chen, Weize, Lin, Yankai, Han, Xu, Liu, Zhiyuan, Sun, Maosong, Zhou, Jie

论文摘要

近年来见证了在NLP中普遍应用预训练的语言模型（PLM）。从参数空间的角度来看，PLM提供了通用初始化，从那里可以找到高性能的最小值。尽管大量作品研究了如何有效有效地适应高性能最小的PLM，但对于在不同的适应配置下达到的各种最小值的连接知之甚少。在本文中，我们通过模式连接的镜头研究了不同最小值的几何连接，该连接量衡量了是否可以通过低损耗路径连接两个最小值。我们进行经验分析以研究三个问题：（1）超参数，特定的调整方法和培训数据如何影响PLM的模式连接？（2）模式连接在预训练期间如何改变？（3）PLM的任务知识沿着连接两个最小值的路径有何变化？通常，探索PLM的模式连接性有助于理解不同最小值的几何连接，这可能有助于我们理解PLM下游适应性的内部功能。

Recent years have witnessed the prevalent application of pre-trained language models (PLMs) in NLP. From the perspective of parameter space, PLMs provide generic initialization, starting from which high-performance minima could be found. Although plenty of works have studied how to effectively and efficiently adapt PLMs to high-performance minima, little is known about the connection of various minima reached under different adaptation configurations. In this paper, we investigate the geometric connections of different minima through the lens of mode connectivity, which measures whether two minima can be connected with a low-loss path. We conduct empirical analyses to investigate three questions: (1) how could hyperparameters, specific tuning methods, and training data affect PLM's mode connectivity? (2) How does mode connectivity change during pre-training? (3) How does the PLM's task knowledge change along the path connecting two minima? In general, exploring the mode connectivity of PLMs conduces to understanding the geometric connection of different minima, which may help us fathom the inner workings of PLM downstream adaptation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题