证据>直觉：编码器选择的可转移性估计

论文标题

证据>直觉：编码器选择的可转移性估计

Evidence > Intuition: Transferability Estimation for Encoder Selection

论文作者

Bassignana, Elisa, Müller-Eberstein, Max, Zhang, Mike, Plank, Barbara

论文摘要

随着自然语言处理（NLP）中大型预训练语言模型（LMS）的可用性的增加，评估其适合特定目标任务的适合先验是至关重要的 - 因为对可用LMS的整个空间进行微调是计算上的过度效果和不可持续的。但是，编码器可传递性估计在NLP中几乎没有受到关注。在本文中，我们建议生成定量证据，以预测哪些LM在模型中，在目标任务上将在无需调整所有候选人的情况下执行最佳。我们为10个NLP任务的LM排名提供了一项全面的研究，涵盖了两种基本问题类型的分类和结构化预测。我们从计算机视觉（CV）中采用最大证据（logMe）量度的最新对数，发现它与94％的设置中的最终LM性能呈正相关。在对此类研究的第一项研究中，我们将可转移性措施与人类从业者排名的事实上进行了比较，发现来自定量指标的证据比纯直觉更强大，并且可以帮助识别出意外的LM候选者。

With the increase in availability of large pre-trained language models (LMs) in Natural Language Processing (NLP), it becomes critical to assess their fit for a specific target task a priori - as fine-tuning the entire space of available LMs is computationally prohibitive and unsustainable. However, encoder transferability estimation has received little to no attention in NLP. In this paper, we propose to generate quantitative evidence to predict which LM, out of a pool of models, will perform best on a target task without having to fine-tune all candidates. We provide a comprehensive study on LM ranking for 10 NLP tasks spanning the two fundamental problem types of classification and structured prediction. We adopt the state-of-the-art Logarithm of Maximum Evidence (LogME) measure from Computer Vision (CV) and find that it positively correlates with final LM performance in 94% of the setups. In the first study of its kind, we further compare transferability measures with the de facto standard of human practitioner ranking, finding that evidence from quantitative metrics is more robust than pure intuition and can help identify unexpected LM candidates.

下载PDF全文

下载文献需遵守相关版权规定

论文标题