论文标题

UVIM:通过学习指导代码的统一建模方法

UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes

论文作者

Kolesnikov, Alexander, Pinto, André Susano, Beyer, Lucas, Zhai, Xiaohua, Harmsen, Jeremiah, Houlsby, Neil

论文摘要

我们介绍了UVIM,这是一种统一的方法,能够对广泛的计算机视觉任务进行建模。与以前的模型相反,UVIM对于所有任务都具有相同的功能形式。它不需要特定于任务的修改,这些修改需要广泛的人类专业知识。该方法涉及两个组件:(i)基本模型(馈电)经过训练,可以直接预测原始视觉输出,并在学习的离散代码和(ii)培训以生成指导代码的语言模型(自动回忆)的指导下。这些组件相互补充:语言模型非常适合建模结构化的相互依赖数据,而基本模型则有效地处理高维输出。我们证明了UVIM对三种不同且挑战性的视觉任务的有效性:泛型细分,深度预测和图像着色,我们可以在其中实现竞争性且接近最新的结果。我们的实验结果表明,UVIM是计算机视觉中统一建模方法的有前途的候选人。

We introduce UViM, a unified approach capable of modeling a wide range of computer vision tasks. In contrast to previous models, UViM has the same functional form for all tasks; it requires no task-specific modifications which require extensive human expertise. The approach involves two components: (I) a base model (feed-forward) which is trained to directly predict raw vision outputs, guided by a learned discrete code and (II) a language model (autoregressive) that is trained to generate the guiding code. These components complement each other: the language model is well-suited to modeling structured interdependent data, while the base model is efficient at dealing with high-dimensional outputs. We demonstrate the effectiveness of UViM on three diverse and challenging vision tasks: panoptic segmentation, depth prediction and image colorization, where we achieve competitive and near state-of-the-art results. Our experimental results suggest that UViM is a promising candidate for a unified modeling approach in computer vision.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源