UVIM：通过学习指导代码的统一建模方法

论文标题

UVIM：通过学习指导代码的统一建模方法

UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes

论文作者

Kolesnikov, Alexander, Pinto, André Susano, Beyer, Lucas, Zhai, Xiaohua, Harmsen, Jeremiah, Houlsby, Neil

论文摘要

我们介绍了UVIM，这是一种统一的方法，能够对广泛的计算机视觉任务进行建模。与以前的模型相反，UVIM对于所有任务都具有相同的功能形式。它不需要特定于任务的修改，这些修改需要广泛的人类专业知识。该方法涉及两个组件：（i）基本模型（馈电）经过训练，可以直接预测原始视觉输出，并在学习的离散代码和（ii）培训以生成指导代码的语言模型（自动回忆）的指导下。这些组件相互补充：语言模型非常适合建模结构化的相互依赖数据，而基本模型则有效地处理高维输出。我们证明了UVIM对三种不同且挑战性的视觉任务的有效性：泛型细分，深度预测和图像着色，我们可以在其中实现竞争性且接近最新的结果。我们的实验结果表明，UVIM是计算机视觉中统一建模方法的有前途的候选人。

We introduce UViM, a unified approach capable of modeling a wide range of computer vision tasks. In contrast to previous models, UViM has the same functional form for all tasks; it requires no task-specific modifications which require extensive human expertise. The approach involves two components: (I) a base model (feed-forward) which is trained to directly predict raw vision outputs, guided by a learned discrete code and (II) a language model (autoregressive) that is trained to generate the guiding code. These components complement each other: the language model is well-suited to modeling structured interdependent data, while the base model is efficient at dealing with high-dimensional outputs. We demonstrate the effectiveness of UViM on three diverse and challenging vision tasks: panoptic segmentation, depth prediction and image colorization, where we achieve competitive and near state-of-the-art results. Our experimental results suggest that UViM is a promising candidate for a unified modeling approach in computer vision.

下载PDF全文

下载文献需遵守相关版权规定

论文标题