从哪里开始？分析中间模型的潜在值

论文标题

从哪里开始？分析中间模型的潜在值

Where to start? Analyzing the potential value of intermediate models

论文作者

Choshen, Leshem, Venezian, Elad, Don-Yehia, Shachar, Slonim, Noam, Katz, Yoav

论文摘要

先前的研究观察到，鉴定模型可能是比香草预告片模型更好的基础模型。这种模型在某些源数据集上进行了填充，可以为所需目标数据集上的新填充过程提供更好的起点。在这里，我们在各种英语分类任务中对这种培养方案进行系统分析。令人惊讶的是，我们的分析表明，可以独立分析正在考虑的目标数据集的潜在培训增益，并将基本模型视为起点。这与当前的看法相反，即目标数据集与用于生成基本模型的源数据集之间的对齐是确定培训成功的主要因素。我们分析了对每个方面的贡献的不同方面。此外，我们利用分析提出了一种实用有效的方法来确定在现实世界中是否以及如何选择基本模型。最后，我们在每个体系结构中发布最佳模型的更新排名https://ibm.github.io/model-recycling/。

Previous studies observed that finetuned models may be better base models than the vanilla pretrained model. Such a model, finetuned on some source dataset, may provide a better starting point for a new finetuning process on a desired target dataset. Here, we perform a systematic analysis of this intertraining scheme, over a wide range of English classification tasks. Surprisingly, our analysis suggests that the potential intertraining gain can be analyzed independently for the target dataset under consideration, and for a base model being considered as a starting point. This is in contrast to current perception that the alignment between the target dataset and the source dataset used to generate the base model is a major factor in determining intertraining success. We analyze different aspects that contribute to each. Furthermore, we leverage our analysis to propose a practical and efficient approach to determine if and how to select a base model in real-world settings. Last, we release an updating ranking of best models in the HuggingFace hub per architecture https://ibm.github.io/model-recycling/.

下载PDF全文

下载文献需遵守相关版权规定

论文标题