论文标题
通过插值来修补开放式摄影模型
Patching open-vocabulary models by interpolating weights
论文作者
论文摘要
在许多图像分类任务中,诸如夹子之类的开放式摄影模型具有高精度。但是,在某些设置中,他们的零拍摄性能远非最佳。我们研究模型修补程序,目的是提高特定任务的准确性,而不会在表现已经足够的任务上降低准确性。为了实现这一目标,我们引入了涂料,这是一种修补方法,该方法在微调之前使用模型的权重与要修补的任务进行微调后的权重之间的插值。在零拍夹的性能较差的九个任务上,油漆可将精度提高15至60个百分点,同时将Imagenet上的精度保留在零摄像机模型的一个百分点之内。油漆还允许在多个任务上修补单个模型,并通过模型刻度进行改进。此外,我们确定了广泛转移的案例,即使任务不相交,对一个任务进行修补也会提高其他任务的准确性。最后,我们研究了超出常见基准的应用程序,例如计数或减少印刷攻击对剪辑的影响。我们的发现表明,可以扩展开放式摄影模型的一组任务集,而无需从头开始重新训练它们。
Open-vocabulary models like CLIP achieve high accuracy across many image classification tasks. However, there are still settings where their zero-shot performance is far from optimal. We study model patching, where the goal is to improve accuracy on specific tasks without degrading accuracy on tasks where performance is already adequate. Towards this goal, we introduce PAINT, a patching method that uses interpolations between the weights of a model before fine-tuning and the weights after fine-tuning on a task to be patched. On nine tasks where zero-shot CLIP performs poorly, PAINT increases accuracy by 15 to 60 percentage points while preserving accuracy on ImageNet within one percentage point of the zero-shot model. PAINT also allows a single model to be patched on multiple tasks and improves with model scale. Furthermore, we identify cases of broad transfer, where patching on one task increases accuracy on other tasks even when the tasks have disjoint classes. Finally, we investigate applications beyond common benchmarks such as counting or reducing the impact of typographic attacks on CLIP. Our findings demonstrate that it is possible to expand the set of tasks on which open-vocabulary models achieve high accuracy without re-training them from scratch.