以插槽为中心的模型测试时间适应

论文标题

以插槽为中心的模型测试时间适应

Test-time Adaptation with Slot-Centric Models

论文作者

Prabhudesai, Mihir, Goyal, Anirudh, Paul, Sujoy, van Steenkiste, Sjoerd, Sajjadi, Mehdi S. M., Aggarwal, Gaurav, Kipf, Thomas, Pathak, Deepak, Fragkiadaki, Katerina

论文摘要

当前的视觉探测器虽然在其训练分布中令人印象深刻，但通常无法将分布场景分离到其组成实体中。最近的测试时间适应方法使用辅助自我监督的损失将网络参数独立适应每个测试示例，并显示出有希望的结果，以实现训练分布之外的概括，以实现图像分类的任务。在我们的工作中，我们发现证据表明，这些损失不足以使场景分解的任务不足，而没有考虑建筑感应偏见。最近以插槽为中心的生成模型试图通过重建像素以自我监督的方式将场景分解为实体。利用这两条工作，我们提出了Slot-TTA，这是一个半监督的以SLOT为中心的场景分解模型，在测试时间通过重建或跨视图合成目标的梯度下降来调整每个场景。我们在多个输入方式，图像或3D点云中评估插槽-TTA，并显示出针对最先进的监督前馈检测器以及替代测试时间适应方法的大量分布性能改进。

Current visual detectors, though impressive within their training distribution, often fail to parse out-of-distribution scenes into their constituent entities. Recent test-time adaptation methods use auxiliary self-supervised losses to adapt the network parameters to each test example independently and have shown promising results towards generalization outside the training distribution for the task of image classification. In our work, we find evidence that these losses are insufficient for the task of scene decomposition, without also considering architectural inductive biases. Recent slot-centric generative models attempt to decompose scenes into entities in a self-supervised manner by reconstructing pixels. Drawing upon these two lines of work, we propose Slot-TTA, a semi-supervised slot-centric scene decomposition model that at test time is adapted per scene through gradient descent on reconstruction or cross-view synthesis objectives. We evaluate Slot-TTA across multiple input modalities, images or 3D point clouds, and show substantial out-of-distribution performance improvements against state-of-the-art supervised feed-forward detectors, and alternative test-time adaptation methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题