通过不对称训练的双分类器学习较低的概括模式，以更好地适应测试时间

论文标题

通过不对称训练的双分类器学习较低的概括模式，以更好地适应测试时间

Learning Less Generalizable Patterns with an Asymmetrically Trained Double Classifier for Better Test-Time Adaptation

论文作者

Duboudin, Thomas, Dellandréa, Emmanuel, Abgrall, Corentin, Hénaff, Gilles, Chen, Liming

论文摘要

深度神经网络通常无法在其训练分布之外推广，尤其是在培训期间只有一个数据域时。尽管测试时间适应在这种情况下产生了令人鼓舞的结果，但我们认为，要实现进一步的改进，应将这些方法与旨在学习更多种模式的培训程序修改相结合。实际上，由于捷径学习现象，通常必须依靠有限的表示方法：仅通过标准培训来学习可用预测模式的子集。在本文中，我们首先表明，现有培训时间策略的综合使用以及一种简单的适应方法的测试时间批准化并不总是在PACS基准上仅在测试时间适应的情况下改善。此外，在办公室家庭上进行的实验表明，在有或没有测试时间批准化的情况下，很少有训练时间方法可以改善标准培训。因此，我们使用一对分类器和快捷方式回避损失提出了一种新颖的方法，该方法通过降低二级分类器的概括能力来减轻快捷方式学习行为，并使用额外的快捷方式避免损失损失，从而鼓励学习样本特定模式。主要分类器经过正常训练，从而学习了自然和更复杂，更不可概括的功能。我们的实验表明，我们的方法对基准的最先进结果改进，并且在测试时间批准化方面受益最大。

Deep neural networks often fail to generalize outside of their training distribution, in particular when only a single data domain is available during training. While test-time adaptation has yielded encouraging results in this setting, we argue that, to reach further improvements, these approaches should be combined with training procedure modifications aiming to learn a more diverse set of patterns. Indeed, test-time adaptation methods usually have to rely on a limited representation because of the shortcut learning phenomenon: only a subset of the available predictive patterns is learned with standard training. In this paper, we first show that the combined use of existing training-time strategies, and test-time batch normalization, a simple adaptation method, does not always improve upon the test-time adaptation alone on the PACS benchmark. Furthermore, experiments on Office-Home show that very few training-time methods improve upon standard training, with or without test-time batch normalization. We therefore propose a novel approach using a pair of classifiers and a shortcut patterns avoidance loss that mitigates the shortcut learning behavior by reducing the generalization ability of the secondary classifier, using the additional shortcut patterns avoidance loss that encourages the learning of samples specific patterns. The primary classifier is trained normally, resulting in the learning of both the natural and the more complex, less generalizable, features. Our experiments show that our method improves upon the state-of-the-art results on both benchmarks and benefits the most to test-time batch normalization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题