关于参数有效微调的有效性

论文标题

关于参数有效微调的有效性

On the Effectiveness of Parameter-Efficient Fine-Tuning

论文作者

Fu, Zihao, Yang, Haoran, So, Anthony Man-Cho, Lam, Wai, Bing, Lidong, Collier, Nigel

论文摘要

微调预训练的模型已被普遍存在地证明是在广泛的NLP任务中有效的。但是，对整个模型进行微调效率低下，因为它始终为每个任务产生一个全新的模型。当前，许多研究工作建议仅微调参数的一小部分，同时将大多数参数保留在不同任务中。这些方法具有令人惊讶的良好性能，并且显示出比相应的完全微调的对应物更稳定。但是，这种方法仍然不太了解。出现了一些自然问题：参数稀疏如何导致有希望的表现？为什么模型比完全微调的模型更稳定？如何选择可调参数？在本文中，我们首先根据他们选择要调整哪些参数的方式将现有方法分类为随机方法，基于规则的方法和基于投影的方法。然后，我们表明所有方法实际上都是稀疏的微调模型，并对它们进行了新的理论分析。我们指出，稀疏性实际上是通过控制稳定性的上限来对原始模型施加正则化。这种稳定性导致了更好的概括能力，这在许多最近的研究工作中都在经验上观察到。尽管我们的理论基于稀疏性的有效性，但仍然是如何选择可调节参数的开放问题。为了更好地选择可调参数，我们提出了一种新颖的二阶近似方法（SAM），该方法通过分析可解决的优化函数近似于原始问题。可调参数是通过直接优化近似函数来确定的。实验结果表明，我们提出的SAM模型的表现优于许多强大的基线模型，并且还验证了我们的理论分析。

Fine-tuning pre-trained models has been ubiquitously proven to be effective in a wide range of NLP tasks. However, fine-tuning the whole model is parameter inefficient as it always yields an entirely new model for each task. Currently, many research works propose to only fine-tune a small portion of the parameters while keeping most of the parameters shared across different tasks. These methods achieve surprisingly good performance and are shown to be more stable than their corresponding fully fine-tuned counterparts. However, such kind of methods is still not well understood. Some natural questions arise: How does the parameter sparsity lead to promising performance? Why is the model more stable than the fully fine-tuned models? How to choose the tunable parameters? In this paper, we first categorize the existing methods into random approaches, rule-based approaches, and projection-based approaches based on how they choose which parameters to tune. Then, we show that all of the methods are actually sparse fine-tuned models and conduct a novel theoretical analysis of them. We indicate that the sparsity is actually imposing a regularization on the original model by controlling the upper bound of the stability. Such stability leads to better generalization capability which has been empirically observed in a lot of recent research works. Despite the effectiveness of sparsity grounded by our theory, it still remains an open problem of how to choose the tunable parameters. To better choose the tunable parameters, we propose a novel Second-order Approximation Method (SAM) which approximates the original problem with an analytically solvable optimization function. The tunable parameters are determined by directly optimizing the approximation function. The experimental results show that our proposed SAM model outperforms many strong baseline models and it also verifies our theoretical analysis.

下载PDF全文

下载文献需遵守相关版权规定

论文标题