仔细研究预先训练的语言模型的校准

论文标题

仔细研究预先训练的语言模型的校准

A Close Look into the Calibration of Pre-trained Language Models

论文作者

Chen, Yangyi, Yuan, Lifan, Cui, Ganqu, Liu, Zhiyuan, Ji, Heng

论文摘要

预训练的语言模型（PLM）可能无法对其预测不确定性进行可靠的估计。我们仔细研究了这个问题，目的是回答两个问题：（1）PLM会学会在培训过程中校准吗？（2）现有校准方法的有效性如何？对于第一个问题，我们进行细粒的控制实验，以研究PLMS训练中PLM的校准性能的动态变化。我们将六个因素视为控制变量，包括数据集难度，可用的培训样本，训练步骤，可调参数的数量，模型量表和训练预处理。我们观察到六个因素的校准性能的一致变化。我们发现，无论预测是否正确，PLM都不会学会在训练中校准，这证明了信心不断增加。我们强调，我们的发现与两个既定的结论相矛盾：（a）更大的PLM被校准了；（b）预处理改善模型校准。接下来，我们研究现有校准方法在缓解过度自信问题方面的有效性。除了不可验证的校准方法（例如标签平滑）外，我们还适应并扩展了两种最近提出的可学习方法，这些方法直接收集数据以训练模型以具有合理的置信度估计。实验结果表明，可学习的方法可显着降低PLM对错误预测的信心。该代码可在\ url {https://github.com/lifan-yuan/plmcalibration}中获得。

Pre-trained language models (PLMs) may fail in giving reliable estimates of their predictive uncertainty. We take a close look into this problem, aiming to answer two questions: (1) Do PLMs learn to become calibrated in the training process? (2) How effective are existing calibration methods? For the first question, we conduct fine-grained control experiments to study the dynamic change in PLMs' calibration performance in training. We consider six factors as control variables, including dataset difficulty, available training samples, training steps, the number of tunable parameters, model scale, and pretraining. We observe a consistent change in calibration performance across six factors. We find that PLMs don't learn to become calibrated in training, evidenced by the continual increase in confidence, no matter whether the predictions are correct or not. We highlight that our finding somewhat contradicts two established conclusions: (a) Larger PLMs are more calibrated; (b) Pretraining improves model calibration. Next, we study the effectiveness of existing calibration methods in mitigating the overconfidence issue. Besides unlearnable calibration methods (e.g., label smoothing), we adapt and extend two recently proposed learnable methods that directly collect data to train models to have reasonable confidence estimations. Experimental results show that learnable methods significantly reduce PLMs' confidence in wrong predictions. The code is available at \url{https://github.com/lifan-yuan/PLMCalibration}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题