预测边缘：确定较大模型在哪里更好

论文标题

预测边缘：确定较大模型在哪里更好

Predicting on the Edge: Identifying Where a Larger Model Does Better

论文作者

Narayan, Taman, Jiang, Heinrich, Zhao, Sen, Kumar, Sanjiv

论文摘要

已经大量精力用于制造大型，更准确的模型，但是几乎没有花费的时间来理解哪些示例从增加的复杂性中受益。在本文中，我们演示和分析了模型对单个示例的预测不确定性与大型模型可以改善对它们的预测的可能性之间令人惊讶的紧密联系。通过对T5编码器架构结构的大量数值研究，我们表明，大型模型在最不确定的示例中具有最大的改进。在更确定的示例中，即使小型模型不是特别准确的示例，大型模型通常无法改进，甚至比较小的模型更糟。基于这些发现，我们表明，当小型模型不确定时，将示例辩护为较大模型的切换器模型可以在性能和资源使用方面实现惊人的改进。我们还探讨了基于委员会的不确定性指标，这些指标可能更有效，但实际上不太实用。

Much effort has been devoted to making large and more accurate models, but relatively little has been put into understanding which examples are benefiting from the added complexity. In this paper, we demonstrate and analyze the surprisingly tight link between a model's predictive uncertainty on individual examples and the likelihood that larger models will improve prediction on them. Through extensive numerical studies on the T5 encoder-decoder architecture, we show that large models have the largest improvement on examples where the small model is most uncertain. On more certain examples, even those where the small model is not particularly accurate, large models are often unable to improve at all, and can even perform worse than the smaller model. Based on these findings, we show that a switcher model which defers examples to a larger model when a small model is uncertain can achieve striking improvements in performance and resource usage. We also explore committee-based uncertainty metrics that can be more effective but less practical.

下载PDF全文

下载文献需遵守相关版权规定

论文标题